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Abstract 

We study regular expressions that use variables, or parameters, which arc interpreted as 
alphabet letters. We consider two classes of languages denoted by such expressions: under the 
possibility semantics, a word belongs to the language if it is denoted by some regular expression 
obtained by replacing variables with letters; under the certainly semantics, the word must be 
denoted by every such expression. Such languages are regular, and we show that they naturally 
arise in several applications such as querying graph databases and program analysis. As the main 
contribution of the paper, we provide a complete characterization of the complexity of the main 
computational problems related to such languages: nonemptiness, universality, containment, 
membership, as well as the problem of constructing NFAs capturing such languages. We also 
look at the extension when domains of variables could be arbitrary regular languages, and show 
that under the certainty semantics, languages remain regular and the complexity of the main 
computational problems does not change. 
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^. ■ In this paper we study parameterized regular expressions like (Ox)*l(xy)* that combine letters from 

a finite alphabet S, such as and 1, and variables, such as x and y. These variables are interpreted 
as letters from S. This gives two ways of defining the language of words over £ denoted by a 
parameterized regular expression e. Under the first - possibility - semantics, a word w £ S* is in 
the language £o(e) if w is in the language of some regular expression e' obtained by substituting 

rS \ alphabet letters for variables. Under the second - certainty - semantics, w € £n(e) if w is in 

the language of all regular expressions obtained by substituting alphabet letters for variables. For 
example, if e = (Ox)*l(xy)*, then OHIO £ £o(e), as witnessed by the substitution x (->• l,y (-»• 0. 
The word 1 is in £n(e), since the starred subexpressions can be replaced by the empty word. As a 
more involved example of the certainty semantics, the reader can verify that for e' = (0|l)*xy(0|l)*, 
the word 10011 is in Cn(e'), although no word of length less than 5 can be in £ n (e'). 

These semantics of parameterized regular expressions arise in a variety of applications. In fact, 
the possibility semantics has already been studied in the case of infinite alphabets [11], with the 
motivation coming from the study of infinite-state systems with finite control (e.g., software with 
integer parameters). We, on the other hand, are interested in the classical case of finite alphabets, 
typically considered in connection with formal languages, and both the possibility and the certainty 
semantics. These are motivated by several applications, in particular in the fields of querying graph- 
structured data, and static analysis of programs. We now explain these connections. 



Applications in graph databases Graph databases, that describe both data and its topology, 
have been actively studied over the past few years in connection with such diverse topics as social 
networks, biological data, semantic Web and RDF, crime detection and analyzing network traffic; 
see [2] for a survey. The abstract data model is essentially an edge-labeled graph, with edge labels 
coming from a finite alphabet. This finite alphabet can contain, for example, types of relationships 
in a social network or a list of RDF properties. In this setting one concentrates on various types of 
reachability queries, e.g., queries that ask for the existence of a path between nodes with certain 
properties so that the label of the path forms a word in a given regular language [3, 6, 7, 9]. 

As in most data management applications, it is common that some information is missing, 
typically due to using data that is the result of another query or transformation [1, 4, 8]. For 
example, in a social network we may have edges a i — > b and a' i — > b', saying that the relationship 
between a and b is the same as that between a' and b'. However, the precise nature of such a 
relationship is unknown, and this is represented by a variable x. Such graphs G whose edges are 
labeled by letters from £ and variables from a set W can be viewed as automata over £ U W. In 
checking the existence of paths between nodes, one normally looks for certain answers [14], i.e., 
answers independent of a particular interpretation of variables. 

In the case of graph databases such certain answers can be found as follows. Let a, b be two 
nodes of G. One can view (G, a, b) as an automaton, with a as the initial state, and b as the final 
state; its language, over £ U W is given by some regular expression e(G,a,b). Then we can be 
certain about the existence of a word w from some language L that is the label of a path from a 
to b iff w also belongs to Cn(e(G, a, &)), i.e., iff L n £n(e(G, a, b)) is nonempty. Hence, computing 
£n(e) is essential for answering queries over graph databases with missing information. 

Applications in program analysis That regular expressions with variables appear naturally 
in program analysis tasks was noticed, for instance, in [18, 19, 20]. One uses the alphabet that 
consists of symbols related to operations on variables, pointers, or files, e.g., def for defining a 
variable, use for using it, open for opening a file, or malloc for allocating a pointer. A variable 
then follows: def (x) means defining variable x. While variables and alphabet symbols do not mix 
freely any more, it is easy to enforce correct syntax with an automaton. An example of a regular 
condition with parameters is searching for uninitialized variables: (->def (x))*use(x). 

Expressions like this are evaluated on a graph that serves as an abstraction of a program. 
One considers two evaluation problems: whether under some evaluation of variables, either some 
path, or every path between two nodes satisfies it. This amounts to computing £<>(e) and checking 
whether all paths, or some path between nodes is in that language. In case of uninitialized variables 
one would be using 'some path' semantics; the need for the 'all paths' semantics arises when one 
analyzes locking disciplines or constant folding optimizations [18, 20]. So in this case the language 
of interest for us is £<>(e), as one wants to check whether there is an evaluation of variables for 
which some property of a program is true. 

Parameterized regular expressions appeared in other applications as well, e.g., in phase-sequence 
prediction for dynamic memory allocation [22], or as a compact way to express a family of legal be- 
haviors in hardware verification [5] , or as a tool to state regular constraints in constraint satisfaction 
problems [21]. 

At the same time, however, very little is known about the basic properties of the languages 
£n(e) and £<>(e). As we mentioned, the £o(e) semantics has been studied in the context of 
infinite alphabets [11]. It was shown that it defines languages that can also accepted by non- 



deterministic register automata of [16]. Language-theoretic issues are quite different over finite and 
infinite alphabets. In the case of infinite alphabets getting closure properties is nontrivial, and 
some problems, such universality and containment, are even undecidable [11]. In contrast, in the 
classical language-theoretic framework of finite alphabets, closure and decidability are guaranteed, 
and the key question is to pinpoint the precise complexity of the main decision problems. 

Thus, our main goal is to determine the exact complexity of the key problems related to lan- 
guages £o(e) and £<>(e). We consider the standard language-theoretic decision problems, such 
as membership of a word in the language, language nonemptiness, universality, and containment. 
Since the languages £n(e) and £o(e) are regular, we also consider the complexity of constructing 
NFAs, over the finite alphabet S, that define them. 

For all the decision problems, we find complexity classes they belong to. In fact, all the prob- 
lems end up being complete for various complexity classes, from NLogspace to Expspace. We 
establish upper bounds on the running time of algorithms for constructing NFAs, and then prove 
matching lower bounds for the sizes of NFAs representing £n(e) and £o(e). Finally, we look at 
some extensions where the range of variables need not be just S. Under the possibility semantics, 
such languages subsume pattern (and even multi-pattern [15]) languages, but under the certainty 
semantics they remain regular, and we establish complexity bounds on the main problems. 

Organization Parameterized regular expressions and their languages are formally defined in 
Section 2. In Section 3 we define the main problems we study. Complexity of the main decision 
problems is analyzed in Section 4, and complexity of automata construction in Section 5. In Section 
6 we study extensions when domains of variables need not be single letters. In the first 10 pages of 
the paper we state the main results and provide quick sketches of the proofs; detailed proofs of all 
the results are in the appendix. 

2 Preliminaries 

Let E be a finite alphabet, and V a countably infinite set of variables, disjoint from S. Regular 
expressions over EUV will be called parameterized regular expressions. Regular expressions, as 
usual, are built from 0, the empty word e, symbols in £ and V, by operations of concatenation (•), 
union (|), and the Kleene star (*). Of course each such expression only uses finitely many symbols 
in V. The size of a regular expression is measured as the total number of symbols needed to write 
it down (or as the size of its parse tree) . 

We write C(e) for the language defined by a regular expression e. If e is a parameterized regular 
expression that uses variables from a finite set WcV, then C(e) C(SU W)*. We are interested 
in languages £n(e) and £<>(e), which are subsets of X*. To define them, we need the notion of 
a valuation v which is a mapping from W to S, where W is the set of variables mentioned in e. 
By u{e) we mean the regular expression over £ obtained from e by simultaneously replacing each 
variable x G W by v{x). For example, if e = (0x)*l(xy)* and v is given by x >— > 1, y t— >■ 0, then 
v [e) = (01)*1(10)*. 

We now formally define the certainty and possibility semantics for parameterized regular ex- 
pressions. 



Definition 1 (Acceptance) Let e be a parameterized regular expression. Then 

Cn(e) .'= f]{C(u(e)) \ v is a valuation for e) (certainty semantics) 

Co(e) := yj{C{v(e)) \ v is a valuation for e} (possibility semantics) 

Since each parameterized regular expression uses finitely many variables, the number of possible 
valuations is finite as well, and thus both Cn(e) and Co(e) are regular languages over £*. 

The usual connection between regular expressions and automata extends to the parameterized 
case. Each parameterized regular expression e over SUW, where W is a finite set of variables in V, 
can of course be translated, in polynomial time, into an NFA A e over £ U W so that C(A e ) = C(e). 
Such equivalences extend to Co and Co- Namely, for an NFA A over £ U W, and a valuation 
v : W — > £, define ^(-4) as the NFA over £ that is obtained from ^4 by replacing each transition 
of the form (q,x,q') in „4 (for q,q' states of A and x € W) with the transition (q,i/(x),q'). The 
following is just an easy observation: 

Lemma 1 Let e be a parameterized regular expression, and A e be an NFA over SUV such that 
C(A e ) = C(e). Then, for every valuation v, we have C(y(e)) = C(v(A e )). 

Hence, if we define Cn(A) as f] u C(i/(A)), and Co(A) as {J h ,C(v(A)), then the lemma implies that 
Cn(e) = Cn(A e ) and £<>(e) = Co(A e ). Since one can go from regular expressions to NFAs in 
polynomial time, this will allow us to use both automata and regular expressions interchangeably 
to establish our results. 

3 Basic Problems 

We now describe the main problems we study here. For each problem we shall have two versions, 
depending on which semantics - Cn or Co is used. So each problem will have a subscript * that 
can be interpreted as □ or O. 

We start with decision problems: 

Nonemptiness* Given a parameterized regular expression e, is £*(e) ^ 0? 

Membership* Given a parameterized regular expression e and a word w € £*, is w G £*(e)? 

Universality* Given a parameterized regular expression e, is C*(e) = £*? 

Containment* Given parameterized regular expressions e\ and e2, is C*(e\) C C^ie^f 1 . 

A special version of nonemptiness is the problem of intersection with a regular language (used 
in the database querying example in the introduction): 

NonemptyIntReg* Given a parameterized regular expression e, and a regular expression e' over 

S, is£(e / )n£*(e)/0? 

The last problem is computational rather than a decision problem: 

ConstructNFA* Given a parameterized regular expression e, construct an NFA A over S such 
that £*(e) = C(A). 



4 Decision problems 

In this section we consider the five decision problems - nonemptiness, membership, universality, 
containment, as well as nonemptiness of intersection with a regular language - and provide precise 
complexity for them. 

We shall also consider two restrictions on regular expressions; these will indicate when the 
problems are inherently very hard or when their complexity can be lowered in some cases. One 
source of complexity is the repetition of variables in expressions like (Ox)*l(xy)* . When no variable 
appears more than once in a parameterized regular expression, we call it simple. Another source 
of complexity is infinite languages, so we consider a restriction to expressions of star-height 0, in 
which no Kleene star is used: these denote finite languages, and each finite language is denoted by 
such an expression. 

4.1 Nonemptiness 

The problem NonemptinesSo has a trivial solution, since C<>(e) ^ for every parameterized 
regular expression e (except e = 0). So we study this problem for the certainty semantics; for the 
possibility semantics, we look at the related problem Nonemptiness- AutomatAo , which, for a 
given NFA A over SUV asks whether £<>(A) / 0. 

Theorem 1 • The problem NonemptinesSo is ExPSPACE-complete. 
• The problem Nonemptiness- Automata<> is NLoGSPACE-complete. 

The result for the possibility semantics is by a standard reachability argument. Note that the 
bound is the same here as in the case of infinite alphabets studied in [11]. To see the upper bound 
for NonemptinesSo, note that there are exponentially many valuations u, and each automaton 
v(A e ) is of polynomial size, so we can use the standard algorithm for checking nonemptiness of the 
intersection of a family of regular languages which can be solved in polynomial space in terms of 
the size of its input; since the input to this problem is of exponential size in terms of the original 
input, the Expspace bound follows. The hardness is by a generic (Turing machine) reduction; the 
proof is in the appendix. 

In the proof we use the following property of the certainty semantics, which shows a striking 
difference with the case of standard regular expressions: 

Lemma 2 Given a set e\, . . . , e& of parameterized expressions of size at most n>k, it is possible to 
build, in time 0(k-n) an expression e' such that Cn{e') is empty if and only i/£n(ei)n- • -n>Cn(efc) 
is empty. 

The reason the case of the £n(e) semantics is so different from the usual semantics of regular 
languages is as follows. It is well known that checking whether the intersection of the languages 
defined by a finite set S of regular expressions is nonempty is PsPACE-complete [17], and hence 
under widely held complexity-theoretical assumptions no regular expression r can be constructed in 
polynomial time from S such that C{r) is nonempty if and only if Hsgs ^( s ) * s nonempty. Lemma 
3, on the other hand, says that such a construction is possible for parameterized regular expressions 
under the certainty semantics. 

The generic reduction used in the proof of ExPSPACE-hardness of NonemptinesSo also provides 
lower bounds on the minimal sizes of words in languages £n(e) (note that the language Co{e) always 
contains a word of the size linear in the size of e). 



Corollary 1 There exists a polynomial p : N — > N and a sequence of parameterized regular expres- 
sions {e n } ne N such that each e n is of size at most p(n), and every word in the language Cn{e n ) has 
size at least 2 2 . 

The construction is somewhat involved, but it is easy to see the single-exponential bound 
(which was hinted at in the first paragraph of the introduction, and which was in fact used in 
connection with querying incomplete graph data in [4]). For each n, consider an expression e n = 
(0|l)*xi . . . x n (0|l)*. If a word w is in C n (e n ), then w must contain every word in {0,1}" as a 
subword, which implies that its length must be at least 2™ + 1. 

We can also show that the use of Kleene star has a huge impact on complexity, which is not at 
the same time affected by variable repetitions. 

Proposition 1 The problem NONEMPTlNESSn remains ExPSPACE-Ziard over the class of simple 
regular expressions, but it is Y^- complete over the class of expressions of star-height 0. 

4.2 Membership 

It is easy to see that Membership^ can be solved in coNP, and Membership^ in NP: one just 
guesses a valuation witnessing w G C(v(e)) or w g" C(v(e)). These bounds turn out to be tight. 

Theorem 2 • The problem Membership d is coNP -complete. 
• The problem Membership^ is 'NP-complete. 

Note that for the case of the possibility semantics, the bound is the same as for languages over 
the infinite alphabets [11] (for all problems other than nonemptiness and membership, the bounds 
will be different). The hardness proof in [11] relies on the infinite size of the alphabet, but one can 
find an alternative proof that uses only finitely many symbols. Both proofs are by variations of 
3-SAT or its complement; the reductions are somewhat involved and are presented in the appendix. 

The restrictions to expressions without repetitions, or to finite languages, by themselves do not 
lower the complexity, but together they make it polynomial. 

Proposition 2 The complexity of the membership problem remains as in Theorem 2 over the 
classes of simple expressions, and expressions of star-height 0. Over the class of simple expressions 
of star-height 0, Membership<> can be solved in time 0(nm ■ log 2 n), where n is the size of the 
expression and m is the size of the word. 

The log n factor appears due to the complexity of the algorithm for converting regular expres- 
sions into e-free NFAs [13]; it appears as one of the steps of the algorithm, which is described in 
the appendix. 

Membership for fixed 'words We next consider a variation of the membership problem: 
Membership* {w) asks, for a parameterized regular expression e, whether w S £*(e). In other 
words, w is fixed. It turns out that for the O-semantics, this version is efficiently solvable, but for 
the □-semantics, it remains intractable unless restricted to simple expressions. 

Theorem 3 • There is a word meS* such that the problem MembershiPd(u>) is coNP -hard 
(even over the class of expressions of star-height 0). 
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• For each word w E S* ; the problem Membership^ (w) is solvable in 0(n) time, if restricted 
to the class of simple expressions. 

• For each word w £ X* , the problem Membership<>(w) is solvable in time 0(n log n). 

4.3 Universality 

Somewhat curiously, the universality problem is more complex for the possibility semantics £<>• 
Indeed, consider a parameterized expression e over S, with variables in W. For the certainty 
semantics, it suffices to guess a word w and a valuation v : W — )• £ such that to £(z/(e)). This 
gives a Pspace upper bound for this problem, which is the best that we can do, as the universality 
problem is PsPACE-hard even for complete regular expressions. On the other hand, when solving 
this problem for the possibility semantics, one can expect that all possible valuations for e will need 
to be analyzed, which increases the complexity by one exponential. (In fact, when one moves to 
infinite alphabets, this problem becomes undecidable [11]). The lower bound proof, which is again 
by a generic reduction, is in the appendix. 

Theorem 4 • The problem UniversalitYd is P "SPACE- complete. 

• The problem Universality^ is ExPSPACE-complete. 

Similarly to the nonemptiness problem, the Expspace bound for Universality^ is quite 
resilient, as it holds even if for simple expressions (note that it makes no sense to study expressions 
of star-height 0, as they denote finite languages and thus cannot be universal). 

Proposition 3 The problem Universalityo remains ExPSPACE-hard over the class of simple 
parameterized regular expressions. 

4.4 Containment 

The bounds for the containment problem are easily obtained from the fact that both nonemptiness 
and universality can be cast as its versions. Since S* C £<>( e ) iff Universalityo(c) is true, and 
£n(e) C iff NoNEMPTlNESSn(e) is false, we get Expspace lower bounds for both containment 
problems. The matching upper bounds are by straightforward enumeration of valuations. Hence, 
we get: 

Theorem 5 Both Containment^ and Containmento are ExPSPACE-complete. 

Containment with one fixed expression We look at two variations of the containment prob- 
lem, when one of the expressions is fixed: Containment* (ei, •) asks, for a parameterized regular 
expression e2, whether C*(e±) C £*(e2); and Containment* (•, e%) is defined similarly. Theorem 5 
show that CONTAlNMENT n (-,e2) and CONTAlNMENT<>(ei, •) remain ExPSPACE-complete. For the 
other two versions of the problem, the proposition below (proved in the appendix) shows that the 
complexity is lowered by at least one exponential. 

Proposition 4 • CoNTAINMENT D (ei, ■) is PsPACE-complete. 

• CONTAiNMENTo(-,e2) is CONP -complete. 



5 Computing automata 

In this section, we first provide upper bounds for algorithms for building NFAs over £ capturing 
£<>(e) and £n(e), and then prove their optimality, by showing matching lower bounds on the sizes of 
such NFAs. Recall that we are dealing with the problem ConstructNFA*: given a parameterized 
regular expression e, construct an NFA A over £ so that C{A) = £*(e). 

Proposition 5 The problem ConstructNFAo can be solved in single- exponential time, and the 
problem ConstructNFAd can be solved in double- exponential time. 

These bounds are achieved by using naive algorithms for constructing automata: namely, one 
converts a parameterized regular expression e over variables in a finite set W into an automaton 
A e , and then for |£|l w valuations v computes the automata v{A e ). This takes exponential time. 
To obtain an NFA for £o(e) one simply combines them with a nondeterministic choice; for £n(e) 
one takes the product of them, resulting in double-exponential time. 

We now show that these complexities are unavoidable, as the smallest NFAs capturing £<>(e) 
or £n(e) can be of single or double-exponential size, respectively. We say that the sizes of minimal 
NFAs for £* are necessarily exponential (resp., double- exponential) if there exists a family {e n } ng N 
of parameterized regular expressions such that: 

• the size of each e n is 0(n), and 

• every NFA A satisfying C(A) = £*(e n ) has at least 2 n (resp., 2 2 ") states. 

Theorem 6 The sizes of minimal NFAs are necessarily double- exponential for Cu, a nd necessarily 
exponential for £<> • 

Proof sketch: We begin with the double exponential bound for Ca- For each n € N, let e n be the 
following parameterized regular expression over alphabet £ = {0, 1} and variables x\, . . . ,x n +\: 

e n = ((o i ir +i r • x x ■ ■ ■ x n ■ Xn+1 ■ ((o i ir +i r . 

Notice that each e n uses n + 1 variables, and is of linear size in n. In order to show that 
every NFA deciding £n(e n ) has 2 2 states, we use the following result from [10]: if L C £* is a 
regular language, and there exists a set of pairs P = {(ui,Vi) | 1 < i < m} C £* x £* such that 
(1) UiVi G L, for every 1 < i < m, and (2) UjVi ^ L, for every 1 < i, j < m and i ^ j, then every 
NFA accepting L has at least m states. 

Given a collection S of words over {0, 1}, let ws denote the concatenation, in lexicographical 
order, of all the words that belong to S, and let uig n denote the concatenation of all words in 
{0, l} n+1 that are not in S. 

Then, define a set of pairs P n = {(ws, Wg n ) \ S C {0, l} n+1 and \S\ = 2 n }. Since there are 2 n+1 

binary words of length n + 1, there are ( 2 „ ) different subsets of {0, l} n+1 of size 2 n , and thus P n 

contains ( 2 „ ) > 2 2 ™ pairs. Moreover, one can show (details in the appendix) that £n(e n ) and P n 
satisfy properties (1) and (2) above, which proves the double exponential lower bound. 

To show the exponential lower bound for Co, define e n = (xi • • • x n )* , and let P n = {(w,w) \ 
w G {0, 1}™}. Clearly, P n contains 2 n pairs. All that is left to do is to show that £o(e n ) and P n 
satisfy properties (1) and (2) above. Details are in the appendix. □ 

Note that the bounds of Theorem 6 apply to simple regular expressions. 

The table in Fig. 1 summarizes the main results in Sections 4 and 5. 



~~~~~~~~~-___^ Semantics 
Problem ^~~~~~~-~~-___^ 


Certainty □ 


Possibility O 


NONEMPTINESS 


ExPSPACE-complete 


NLoGSPACE-complete 
(for automata) 


Membership 


CONP-complete 


NP-complete 


Containment 


ExPSPACE-complete 


ExPSPACE-complete 


Universality 


PSPACE-complete 


ExPSPACE-complete 


NonemptyIntReg 


ExPSPACE-complete 


NP-complete 


ConstructNFA 


double-exponential 


single-exponential 



Figure 1: Summary of complexity results 



6 Extending domains of variables 

So far we assumed that variables take values in E: our valuations were partial maps v : V — > E. 
We now consider a more general case when the range of each variable is a regular subset of E*. 



Let e be a parameterized regular expression with variables x\, 



and let L\, . . . , L n C E* 



be nonempty regular languages. We shall write L for (L\, . . . ,L n ). A valuation in L is a map 
v : x — > L such that v{xi) G Li for each i < n. Under such a valuation, each parameterized regular 
expression e is mapped into a usual regular expression v{e) over E, in which each variable X{ is 
replaced by the word v(xi). Hence we can still define 

£n(e;L) = f]{C(v(e)) \ v is a valuation over L} 
j£o(e;L) = [j{^(v(e)) \ v is a valuation over L} 

According to this notation, £n(e) = £n(e; (E, . . . , E)), and likewise for Co. 

Note however that intersections and unions are now infinite, if some of the languages Lj's are 
infinite, so we cannot conclude, as before, that we deal with regular languages. And indeed they 
are not: for example, £o(xx; E*) is the set of square words, and thus not regular. 

We now consider two cases. If each Lj is a finite language, we show that all the complexity 
results in Fig. 1 remain true. Then we look at the case of arbitrary regular Lj's. Languages £o{e\ L) 
need not be regular anymore, but languages Cn(e;L) still are, and we prove that the complexity 
bounds from the certainty column of Fig. 1 remain true. For complexity results, we assume that 
in the input (e;L), each domain Li is given either as a regular expression or an NFA over E. 

6.1 The case of finite domains 

If all domain languages Lj's are finite, all the lower bounds apply (they were shown when each 
Li = E). For upper bounds, note that each finite Li contains at most exponentially many words in 
the size of either a regular expression or an NFA that gives it, and each such word is polynomial 
size. Thus, the number of valuations is at most exponential in the size of the input, and each 
valuation can be represented in polynomial time. The following is then straightforward. 

Proposition 6 // domains Li's of all variables are finite nonempty subsets of E*, then both 
Ca(e;L) and Co{e; L) are regular languages, and all the complexity bounds on the problems re- 
lated to them are exactly the same as stated in Fig. 1. 
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6.2 The case of infinite domains 

We have already seen that if just one of the domains is infinite, then £<>(e; L) need not be regular 
(the £<>{xx]Yi*) example). Somewhat surprisingly, however, in the case of the certainty semantics, 
we recover not only regularity but also all the complexity bounds. 

Theorem 7 For each parameterized regular expression e using variables x±, . . . ,x n and for each 
an n-tuple L of regular languages over S, the language Cn(e;L) C S* is regular. Moreover, the 
complexity bounds are exactly the same as in the □ column of the table in Fig. 1. 

Proof sketch: We only need to be concerned about regularity of £n(e;L) and upper complexity 
bounds, as the proofs of lower bounds apply for the case when all Lj = E. For this, it suffices to 
prove that there is a finite set U of NFAs so that £n(e; L) = Pl^ec; £(*4). Moreover, it follows from 
analyzing the proofs of upper complexity bounds, that the complexity results will remain the same 
if the following can be shown about the set U: 

• its size is at most exponential in the size of the input; 

• checking whether A £ U can be done in time polynomial in the size of A; 

• each A £ U is of size polynomial in the size of the input (e; L). 

To show these, take A e and from it construct a reduced automaton A' e in which all transitions 
(q,Xi,q') are eliminated whenever Li is infinite. We then show that Cn(A e ;L) = Cn(A' e ;L) (the 
definition of Cn extends naturally from regular expressions to automata for arbitrary domains). 
This observation generates a finite set U of NFAs which result from applying valuations with finite 
codomains to A' e . We prove in the appendix that these automata satisfy the required properties. 

7 Future work 

For most bounds (except universality and containment), the complexity under the possibility se- 
mantics is reasonable, while for the certainty semantics it is quite high (i.e., double-exponential in 
practice). At the same time, the concept of £n(e) captures many query answering scenarios over 
graph databases with incomplete information [4]. One of the future directions of this work is to 
devise better algorithms for problems related to the certainty semantics under restrictions arising 
in the context of querying graph databases. 

Another line of work has to do with closure properties: we know that results of Boolean op- 
erations on languages £n(e) and C<y{e) are regular and can be represented by NFAs; the bounds 
on sizes of such NFAs follow from the results shown here. However, it is conceivable that such 
NFAs can be succinctly represented by parameterized regular expressions. To be concrete, one 
can easily derive from results in Section 5 that there is a doubly-exponential size NFA A so that 
C(A) = Cn(e±) fl£n(e2), and that this bound is optimal. However, it leaves open a possibility that 
there is a much more succinct parameterized regular expression e so that £n(e) = Cn(ei) nCnfa)] 
in fact, nothing that we have shown contradicts the existence of a polynomial-size expression with 
this property. We plan to study bounds on such regular expressions in the future. 
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8 APPENDIX: COMPLETE PROOFS 

Proof of Theorem 1: 

(Part 1) We begin with the upper bound. Let e be a parameterized regular expression, and as 
usual assume that W is the set of variables mentioned in e. By definition, £□ is defined as the 
intersection of each L(v(e)), for all possible valuations v : W — > £ for e. Clearly, the total number of 
those valuations is |£|' '. Thus, since there are only exponentially many valuations, an Expspace 
algorithm can guess, symbol by symbol, a word w £ Cn(e), while checking, in parallel, that w 
belongs to each L{v{e)), for every such valuation v : W — > £. 

The proof for the lower bound relies heavily on the following lemma: 

Lemma 3 Given a set e±, . . . , e^ of parameterized expressions over an alphabet £ that contains at 
least two symbols, it is possible to build, in polynomial time with respect to ei, . . . , e k , an expression 
e' such that £n(e') is empty if and only if £n(ei) H ■ ■ ■ PI £n(e k ) is empty. 

Proof: Let e±, . . . , e^ as stated in the Lemma, and let a, b € £. We use (£ — a) as a shorthand 
for the expression whose language is the union of every symbol in £ different from a, and define 
A = [(£ — a)* ■ a ■ (£ — a)*] 1 , for 1 < i < k — 1. Finally, let x±, . . . ,Xk-i be fresh variables. We 
define e' as 

(S — a)* ■ x\ ■ (S — a)* ■ X2 • (S — a)* • • • Xk-i ■ (S — a)*- 

(ba k b ■ ei | b ■ A 1 ■ ba k b ■ e 2 \ b ■ A 2 ■ ba k b • e 3 | • • • \b- A^ 1 ■ ba k b ■ e k ). 

To prove the lemma, consider first that a word w that belongs to £n(ei) D • • • fl £p(efc). Then, it 
can be proved that the word {c k ab) k ~ l ba k bw belongs to Ca(e') where c is the concatenation (say, 
in lexicographical order) of all the symbols in £ different from a. On the other hand, assume that 
a word w belongs to £n(e'). We need the following claim: 

Claim 1 The word w can be decomposed into u-bab-v, where u contains exactly k — \ appearances 
of a. 

Proof: By the inspection of e', we conclude that w must contain the substring ba k b. Now, let 
w = u • ba b ■ v, where u does not contain substring ba b. Assume first that u contains less than 
k — 1 appearances of the symbol a. Then, consider a valuation v that maps each variable in e! to 
the symbol a. Since v{e') is of form 

((£-a)*a) fc - 1 (£-a)*-(&a fc 6-^(e 1 ) | b-A 1 -ba k b-v(e 2 ) | b-A 2 -ba k b-v(e 3 ) \ ■■■ \ b-A k - 1 -ba k b-v{e k )). 

we conclude that the language of v{e') cannot contain any word that starts with uba , b, since all 
the words in £(v(e')) must start with a prefix in the language 

((£ - o)*o)* -1 (S - a)* ■ b. 

Next, assume that u contains more than k — 1 appearances of the symbol a, an consider a valuation 
v 1 that maps each variable in e' to the symbol b. Then again, v'(e!) is of form 

{(i:-a)*b) k - 1 (E-a)*(ba k b-v'(e 1 ) | b-A 1 -ba k b-v'{e 2 ) \ b-A 2 -ba k b-v'{e z ) \ ■■■ \ b-A k ^-ba k b-v'{e k )). 
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Then, notice that any word in £(v'(e')) is such that the symbol a cannot appear more than k — 1 
times before the substring ba b. We conclude that C(v'(e')) cannot contain a word starting with 
u ■ ba k b ■ v. □ 

Using this Claim, It is now straightforward to show that, if a valuation v assigns the symbol a to 
exactly j variables in {x±, . . . ,Xk-i} (1 < j < k — 1), then v must belong to Cn{ek-j-i)- This 
shows that v belongs to £n(ei) Pi - - - PI £n(ek), which finishes the proof of the Lemma. □. 

Next we continue with the proof of the theorem, specifically, we show the Expspace lower 
bound for the problem NONEMPTlNESSn • The proof is by a reduction from the acceptance problem 
of Turing machines. Let L be a language that belongs to Expspace, and let M be a Turing 
machine that decides L in Expspace. Given an input a = oo, • • • , Ofc-i ; we construct in polynomial 
time with respect to M. and a a parameterized regular expression eM.a such that Cn{eM,a) ^ if 
and only if A4 accepts a. 

Assume that M. = {Qm^m^M U {B}, qo, {qm},$M} : > that is, the states of M. are Qm = 
{qo, . . . ,q m }, the initial state is qo, the alphabet of the input M. is T>m, an d the alphabet of M. is 
T_\4 = T,_m U {B}, that is, M uses an additional blank symbol B; and that the set of transitions 
of A4 is 5m- Without loss of generality, we assume that M has only one tape, starts with the 
input copied on the first \a\ cells of its tape, that M. has only one final state q m , and that no 
transition is defined for that state. Moreover, it will be easier for us to assume that the machine 
always end after an odd number of steps (although the reduction can be improved to work without 
this assumption). Since Ai decides L in Expspace, there is a polynomial S() such that, for every 
input a over £_/n, M. decides a using space of order 2 °d a D. Assume for notation convenience that 
S-(|o|)=n. 

Due to Lemma 3, it suffices to construct in polynomial time a set E containing a polynomial 
number of parameterized regular expressions, such that f) e&E £n(e) is empty of and only if A4 
accepts on input a. But before we describe the set E, we need some notation. We use the shorthand 
[i] to denote the binary representation of the number i as a string of n characters. For example, [0] 
corresponds to the word n , and [2] corresponds to the word n_2 10. Roughly speaking, any word 
accepted by all the expressions in E should represent a valid run of the Turing machine. In order 
to do so, E is constructed so that all words that belong to fleeE ^ D ( e ) represent a sequence 

[even] • Lq ■ [even] ■ [odd] ■ L\ ■ [odd] ■ [even] ■ L2 • [even] • • • , 

where each of the L's represent a configuration of M., coded as 

{action ■ stateb ■ [0] • statea ■ action ■ stateb ■ [1] ■ statea ■ ■ ■ action ■ stateb ■ [2 n — 1] • statea)* . 

Each construct action ■ state ■ [i] ■ state represents the content of the i-th cell of the Turing machine 
before and after one given point of the computation, plus the action that was performed in that 
cell in the given step step. More precisely, the word action is a three bit string that codes either 
nothing, read/write, or advance head, stateb is a binary word coding X^ US^ x Qm, representing 
the content of the i-th cell before a given step in the computation of M (if stateb codes a pair in 
T,m x Qm this represents that A4 was pointing at that cell) , [i] is the binary representation of the 
number i (0 < i < 2 n — 1) that represents the cell's position, and statea is a binary word coding 
Y,M U ^M x Qm that represents the content of the i-th cell after the computation. 

Thus, intuitively, each of this words can be seen as a sequence of descriptions of M, each 
description consisting of a sequence of tuples that encode, for each cell, the construct action, state 
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before the action, position and state after the action. The idea behind the reduction is that E 
should only accept those words representing a valid computation, that ends in a final state. The 
rest of the reduction is devoted to construct such set E. We divide the set E into sets Ef orm , Ei, 
Ef, E or( ier and E state . But let us first show our coding scheme. 

Let p = log(r + |r||Q|). We code each content of a cell (that is, strings corresponding to 
statea and stateb coding r U V x Qj^\ as a p-bit string. Moreover, we code the actions as 3-bit 
strings. For simplicity, we denote this strings by [even], [odd], [nothing], [read] and [head]; where 
[even] = 000, [odd] = 001, [nothing] = 100, [read] = 101 and [/lead] = 111. Finally, we use [action] 
as a shorthand for [nothing] \ [read] \ [head] (or, equivalently, 100 | 101 | 111). 

We now define the expressions in E. First, Ef orm contains the expression 

([even][action]{0 | If (0 | l) n (0 | If [even] [odd] [action] (0 | lf(0 | l) n (0 | l) p [odd])* 

The intuition behind Ef orm is that it defines that each word in f] e&E Cn(e) is of the form that 
was explained in the previous paragraphs. 

Next, we define E or( i eT , that intuitively forces each of the description of the L's in C\ eGE Cn(e) 
to be of form 

(action ■ stateb ■ [0] • statea ■ action ■ stateb ■ [1] ■ statea ■ ■ ■ action ■ stateb ■ [2 n — 1] • statea)* . 

That is, the slots used to code the position of the cell in each description have to be arranged in 
numerical order. We split -Ewer into -E^der an< ^ ^ order- The sets are as follows: -E^fer contains 
the expression 

([even] \ [odd][action](0 | l) p [0](0 | If [even] + [odd])- 

([even] \ [odd] [action] (0 | l) p (0 | l) n (0 | If [even] + [odd])" 

stating, intuitively, that each portion of the words belonging to E orc i er start with a [0] in the 
slot corresponding to the cell position. In addition, for each 1 < m < n — 1, E^™ r contains the 
expression 



([even] | [odd] 

[[action](0 | l) p (0 | lf"^^^ | ^""^(O | lf[action](0 | l) p (0 | l)™-" 1 "^^ | l)"^ 1 ^ | l) p )| 

([action](0 | l) p (0 | l)™"" 1 " 1 !^ | ^^^(O | lf[action](0 | l) p (0 | l)"-" 1 " 1 !^ | l)" 1 " 1 !^ | l) p )^j 



even 



[odd])' 



The idea is that each substring [i] marking an even position in the tape has to be fol- 
lowed by it's successor [i], by forcing the word to be a concatenation of consecutive constructs 
actionstatea[i]stateb actionstatea[j]stateb in which (1) [i] ends in and [j] ends in 1, and (2) the 
t-th bit of [i] and [j] are equal, for every 1 < t < n — 1 (where the last bit of [i] is the n-th one). In 
the same fashion, E°~L~ ensures that each string representing an odd position is to be followed by 
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it's successor, and that the last of these substrings has to be [2 n — 1], or l n . We omit the description 
since it follows the same lines of -E^fer ■ 

Next, Ei ensures that the first description corresponds to the initial configuration of A4: it 
contains the expression 

[even] [nothing] [q, ao] [0] [q, a>o] [nothing] [a\] [1] [a\] [nothing] [a&_i] [k — 1] [a/c_i] 

{[nothing][B]{0 \ l) n [B]) * [even]Z* M 

Here we abuse the notation, and put [a] instead of the string coding the content a € X, and [q, a] 
instead of the string coding the content a £ X^vi, and stating that the head is in the corresponding 
position, in a state q G Qm- 

Furthermore, the expressions in Ef ensure that the last description ends in a final state: it 
contains only the expression 



M YT[even] \ [odd] 
qeQ,aer 

([action]{0 | lf(0 | l) n (0 | l) p )*[head]{0 | l) p (0 | l) n [q, a]([action]{0 | l) p (0 | l) n (0 | l) p )* 

[even] | [odd] 

Next, we ensure that the state is carried along the descriptions: the state after the action in 
the j-th. description has to coincide with the state carried along the j + 1-th description. This is 
accomplished with parameterized expressions E^Pe an d E°f^ te , where E^Pe requires that, for all i, 
< i < 2 n — 1, the state after an even computation in the i-th cell of the tape corresponds exactly 
to the state before the next computation. 



rpeven 
-'-'state 



[j ([even]{ [action] (0 | l) p (0 | l) n (0 | l) p )* 

stateGVUTxQ 

([action]{0 \ l) p Xl ••• x n [state] {[action] (0 | l) p (0 | l) n (0 | lf)*[even] 
[odd]([action]{0 \ l) p (0 | l) n (0 | l) p )* {[action] [state^ ■■■ x n {0 \ l) p 

{[action]^ | l) p (0 | l) n (0 | l) p )*[odd]) 



Expression E°f^ te is defined accordingly, simply by interchanging strings 



even 



and [odd], but 



carefully allowing for the possibility that a word representing a computation might end in an odd 
configuration (that is, an odd configuration may be followed by an even configuration with the 
aforementioned properties, or may be the last configuration of the computation). 

Finally, we describe the set E ac ti on - Intuitively, the expressions in this set ensure that in each 
configuration a step is taken that is valid w.r.t. the transitions in Sm- Roughly speaking, it forces 
each configuration to be of form L5, for some transition 5 € 5m- The formal description is as 
follows. For each transition in 5m of form 5{q, a) = {q', a' , {— >, <—}), let L^eita be the language 
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feer 



\J [even] \ [odd]([nothing](0 | l) p (0 | l) n (0 | l) p )* 



[read][q,a](0 | l) n [a'] [head] [b] (0 | l) n [g',6]([not/im5](0 | l) p (0 | l) n (0 | l) p )*[even] \ [odd] ), 
if 5 advances to the right, or the language 



(J ([even] | [odd] ( [nothing] (0 | l) p (0 | l) n (0 | l) p ) 



([even] \ [odd] 



(J [nothing] [state] (0 | l) n [state]) \ 

stateeTUFxQ 

([read](0 | l) p (0 | l) n (0 | l) p ) 



ber 

[head][b](0 \ l) n [q' ,b][read][q,a]{0 \ l) n [a']([nothing]{0 | l) p (0 | l) n (0 | l) p )*[even] \ [odd] ), 

if 5 moves to the left. Then, E ac ti on includes the expression 

[even]([nothing](0 | 1)*(0 | l) n (0 | l) p )*[even]( \J L 5 )*. 

5£8m 

Moreover, it also includes the expression 



([head](0 | l) p (0 | l) n (0 | l) p ) J [even] | [odd])*, 

which ensures that the content of a cell does not change if no action is perceived on it. 

It follows immediately from the construction that flees ^ n ( e ) ^ s em Pty if and only if M. accepts 
on input a. 

(Part 2) Let A be an NFA over S U W, where W is a set of variables. Then, for every two 
valuations v\ and 1*2, we have C{yi(A)) ^ iff C(v2(A)) ^ 0. Indeed, a path from an initial to a 
final state in one automaton guarantees the existence of such a path in the other one. Hence to 
check Co(A) ^ one can take an arbitrary valuation v and check whether C(u(A)) ^ 0; this of 
course is reachability and can be done in NLogspace. Hardness is already known for automata 
that do not use variables. 

Proof of Lemma 2 : 



Consider first the case when X has at least two symbols. Then, by using the construction of Lemma 
3 it is easy to see that the resulting expression e' is of size 0(|S| • k + k ■ (|S| +n + /c)). We have 
assumed than k < n, and it is reasonable to assume that |E| < k < n, thus the resulting expression 
is of size 0(k ■ n). 

If E contains a single symbol, then for 1 < i < k we have that Ca(ei) = L(e' i ), where e\ is the 
expression resulting from replacing all parameters in e^ with the symbol from X. Afterwards, £ 
can be augmented with an extra symbol and the construction of Lemma 3 can be used just as we 
have previously explained, except this time on input e[, . . . , e' k . 
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Proof of Corollary 1 : 

We begin the proof with some observations. Given a Turing machine M that decides on input a 
using exponential space, the reduction of Theorem 1 essentially constructs a set of parameterized 
regular expressions Em,o, such that every word w G f] e e£ _ Cn(e) represents a run (or, more 
precisely, a sequence of configurations) of M. on input a. 

Moreover, the same proof also shows how to construct a regular expression eM,a such that 
PleG-Efw - £ Q ( e ) i s em Pty if and only if Cn(eM,a) is empty. Notice as well that from the proof of 
Lemma 3 it is easy to see that the size of every word w E Cn(eM,a) is at least the size of the 
smallest word w' € flegs - ^n( e )- 

Next, we define the sequence {e n ,} ng N of parameterized regular expressions as stated in the 
corollary. Clearly, for each n £ N, it is possible to construct a deterministic Turing machine M n 
over alphabet X = {0, 1} that on input l n works for exactly 2 2 ™ steps, using at most 2 n cells. 

Thus, for each n, we define e n as the expression such that £n(e n ) is empty if and only if 
HeGE n £n( e ) is empty, constructed as in the proof of Lemma 3. 

According to the reduction in the proof of Theorem 1, flees « ^n( e ) contains a single word, 
of length greater than 2 2 , representing the single run of A4 n on input l n . Then, by our above 
observations, we conclude that all words in £ n {e n ) are of size at least 2 . 

Proof of Proposition 1 : 

(Part 1) The Expspace reduction on Theorem 1 can be modified so that all the expressions in 
E are simple. In order to do this, we shall modify the coding scheme of the previous reduction. 
Basically, the new reduction should include one preceding bit in each of the slots corresponding 
to action, stateb (content of the cell before the action), position and statea (content of the cell 
after the action). This extra bit is set to if these are part of an even configuration (that is, a 
configuration between two strings [even]), or 1 if they are part of an odd configuration. 

Under this modified coding, all that is left to do is to modify the set E according to the new 
coding. The only substantial change is in the sets E^te an d E°f^ te , the rest being straightforward. 
Thus, we only state these expressions. Define E™^ e as follows: 

[ \J ([ew]((0 | 1) 4 (0 | l) p+1 (0 | l) n+1 (0 | l) p+1 )*((0 | 1) 4 (0 | l) p+1 

(x 1 ---x n (0[state]{0{0 | 1) 3 0(0 | l) p (0 | l) n 0(0 | l) p )*[et;en] 
[odd](l(0 | 1) 3 1(0 | l) p (0 | l) n l(0 | 1) P )*1(0 | lfl[state\) | 

(1(0 | l) p l(0 I 1) 3 1(0 I l) p (0 I l) n l(0 I l) p )*[odd]) 



And, E°fa te is defined accordingly, by mutually interchanging all appearances of [even] and [odd]. 
Notice that, from the form of the rest expressions in E, the only words w £ Cn{Efl^ e ) that 
belongs to flees £c( e ) are those in which after every string of form v{x\) ■ ■ ■ v{x n ) for some valuation 
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v : S — > {x\, . . . , x n } we follow with a string in 
0[stote](0(0 | 1) 3 0(0 | l) p (0 | l) n 0(0 | l) p y[even] 

[odd] (1(0 1 i) 3 i(o | i) p (o i i) n i(o i iyy 

1(0 | l) 3 l[siate]z^(xi)---z/(x n )l(0 | If 

1(0 | 1) 3 1(0 | l) p (0 | l) n l(0 | l) p )*[odd] (1) 

Thus, under this modified coding, expressions E^Pe an< ^ E°f^ te have the same intended behavior 
as in the reduction of Theorem 1. We omit the rest of the reduction since it goes along the same 
lines as the previous version. 

(Part 2): It is easy to see that, if e does not use Kleene star, then all the words w € £n(e) are 
of size polynomial with respect to the size of e. This immediately gives a Sf algorithm for the 
emptiness problem: Given a a parameterized regular expression e not using Kleene star, guess a 
word w and a valuation v for e, and check that w £ C n (e) (which, from Theorem 2, can be performed 
using an NP oracle). The Sf hardness is established via a reduction from the compliment of the 
V3 3-SAT satisfiability problem. This problem is defined as follows: formula <p is the conjunction 
of clauses {C±, . . . , C p }, each of which has 3 variables from the disjoint union of {xi, . . . , x m } and 
{yi, . . . , yt}. The problem asks whether there exists an assignment &$ for {x±, . . . , x m } such that 
for every assignment ay for {y±, . . . , yt } it is the case that (p is not satisfiable. 

Let (p := Vxi • • • \/x m 3yi . . . 3y t C\ A • • • A C p be an instance of V3 3-SAT. From (p we construct 
in polynomial time a parameterized regular expression e over alphabet S = {0, 1} such that there 
exists an assignment a s for {x±, . . . , x m } such that for every assignment ay for {yi, . . . , y t } it is the 
case that <p is not satisfiable if and only if Lu (e) is not empty. 

Let each Cj (1 < j < p) be of form (£j Vf]v £ 3 ), where each literal £* , for 1 < j < p and 
1 < i < 3, is either a variable in {xi, . . . , x m } or {yi, . . . , y t }, or its negation. We associate with each 
propositional variable Xk, 1 < A; < m, a fresh variable X& (representing the positive literal) and a 
fresh variable X/% (representing the negation of such literal). Also, with each propositional variable 
Vh i 1 < k < t, we associate fresh variables Yfc and Ifc. Then let W = {Ai, . . . , A m , Ai, . . . , A m } U 
{Yi, . . . , Yt , , Yi, . . . , Yt} U {Z}, where Z is a fresh variable as well. 

Next we define expression e over S = {0, 1} and W. We find it useful to split the definition in 
two parts. First, define 

e = (Z-0-ei) | (lZe 2 ), 

with 

ei = 1100(0 | l) m 000 

and 

€■2 = 62,2,1 I - - - I e2,2,m | e 2,2,l I ' ' ' I C2,2,m | e 2,3,l I ' " " I e 2,3,t ^2,3 | e2,4, 

where 

• for each 1 < k < m, define e 2 ,i, fc = 1100(0 | l) fc_1 A fe (0 | l) m ^ fe 000 

• for each 1 < k < m, define e 2 , 2 ,fc = (A fc A fc 00(0 | l) m 000) | (llA fc A fc (0 | l) m 000); 

• for each l<k<t, e 2 , 3 , fc = (Y fc Y fc 00(0 | l) m 000) | (llY fc Y fc (0 | l) m 000); 
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• Let h be a function that maps each literal P to the variable X^, if £\ corresponds to x k or 
to Xk, if ij corresponds to ->x k (l<j<p, l<i<3 and 1 < k < m); or to Y k , if t % , 
corresponds to y k or to Y k , if Pi corresponds to -<yk- Then define e2,4 = 1100(0 | l) m (h{£\) ■ 
h{£\)-h{£\)\ ■■■ \h(£ l p )-h(£ 2 p )-h(£ 3 p )). 

We now prove that £n(e) ^ if and only if there exists an assignment a x for {xi, . . . , x m } such 
that for every assignment ay for {yi, . . . , y^} it is the case that ip is not satisfiable. 

(^=): Assume first that there exists an assignment a x for {x±, . . . ,x m } such that for every 
assignment ay for {y±, . . . , y t } it is the case that ip is not satisfiable. Define oi, . . . , a m as follows: 
for each 1 < k < m, a k = if and only iff u^ assigns the value 1 to the variable x k . Let then 
w = lOHOOai • • • a m 000 be a word in E*. We claim that u; € £n(e). To prove this claim, let 
v : W — >• E be a valuation. We show that u; £ £(zv(e)). The proof is done via a case by case 
analysis: 
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Assume first that v{Z) = 1. Then, since llOOai • • • a m 000 is clearly denoted by e±, we have 
that w G C(u(e)). 

• Next, assume that v(Z) = 0, and for some 1 < k < m it is the case that v(Xf.) = v(Xj-)- 
Then, it is easy to see that llOOai • • • a m 000 is denoted by expression 

v{e 2 ,2,k) = {v(X k )v(X k )00(0 | l) m 000) | (llu(X k )u(X k )(0 \ l) m 000). 

And thus we have that w € C(u(e)). 

• Assume now that v(Z) = 0, and for some 1 < k < t it is the case that v(y k ) = v(Y k ). Then, 
it is easy to see that HOOaiai • • • a m a m 000 is denoted by expression 

"(62,3,*) = {v(Y k MY k )00(0 | l) m 000) | (llv(Y k )v(Y k )(0 \ l) m 000). 

And thus we have that w € C(u(e)). 

The remaining valuations are such that v(Z) = 0, v{X k ) ^ v(X k ) for each 1 < k < m, and for each 
1 < k < t we have that v(Y k ) ^ v(Y k ). Let us continue. 

• Assume now that for some 1 < k < m it is the case that v{X k ) = 1 but <r x (xk) = 0. Then, 
a k = 1, and thus it is easy to see that the word llOOai • • • a m 000 is denoted by v(e2,i, k ), 
that corresponds to the expression 1100(0 | i) k ^ 1 u(Xi < .)(0 \ l) m_fc 000 which entails that w is 
denoted by v{e). 

• Assume now that that v(Z) = 0, u{X k ) ^ v{X k ) for each 1 < k < m, for each 1 < k < t 
we have that v(Y k ) ^ v(Y k ), and for some 1 < k < m it is the case that v(X k ) = but 
&x{xk) = 1- Then, a k = 0, and thus it is easy to see that the word llOOai • • • a m 000 is 
denoted by v(e2,i tk ), that corresponds to the expression 1100(0 | l) k ~ l u(Xk)(Q \ l) m_fc 000 
which entails that w is denoted by v(e). 

• Finally, assume that v{Zi) = 0, v(X k ) ^ v{Xk) for each 1 < k < m, for each 1 < k < t 
we have that v(Y k ) ^ v(Y k ), and for each 1 < k < m it is the case that v{Xk) = cr x (xk)- 
Define the following valuation ay for the variables in {yi, . . . ,yt}'- &y(yk) = v {Xk)i f° r each 
1 < k < t. From our initial assumption, there exists at least a clause Cj, 1 < j < p, that is 
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falsified under the assignment a x , ay. From the fact that v{X k ) ^ v(X k ) for each 1 < k < m, 
for each 1 < k < t we have that v(Yk) ^ v(Y k ), and for each 1 < k < m it is the case that 
u(X k ) = a x (x k ), we conclude that u(h(£\)) ■ v{h{l\)) ■ v(h{l\)) corresponds to the string 000, 
which proves that llOOai • • • a m 000 belongs to C(y{e2^)), and thus w is denoted by v{e). 

(=>): Assume now that there exists a word w in £n(e). It is straightforward to show that 
w must begin with the prefix 10: if w begins with either 00 or 01 then it cannot be denoted by 
z^i(e), where u\ assigns a letter 1 to all variables in W. Moreover, if it begins with 11 then it 
cannot be accepted by the valuation that assigns the letter to all variables. Let then w = 10t>. 
Furthermore, let v be an arbitrary valuation such that v(Z) = 1. since w belongs to £(i/(e)), then 
v must be denoted by e±, and so it must be that v is of form 1100(0 | l) m 000. Let a x be a valuation 
for {xi, . . . ,x m } defined as follows: a x (x k ) = 1 if v is of form 1100(0 | l) fc-1 0(0 | l) m_fe 000, and 
a x {x k ) = if v is of form 1100(0 j l) fc ~ 1 l(0 j l) m ~ fc 000. Next, we show that for each valuation Oy 
for {yi, . . . ,y t } it is the case that tp is not satisfied with valuation a x ,ay. Assume for the sake of 
contradiction that there is a valuation ay for {yi, . . . ,yt} such that a x ,ay satisfies ip. Define the 
following valuation v : W — > E: 

• u(Z) = 

• v(Xk) = Gx{%k)i for each 1 < k < m, 

• u(Yk) = Gy(yk), for each 1 < k < t, 

• v{Xk) = 1 if and only if v{Xk) = 0, 

• u (Y k ) = 1 if and only if v{Y k ) = 

Let us now show that lQv £ C(v(e)), contradicting our initial assumption that £n(e) = 0. Since 
v{Z) = 0, we have then that v must be denoted by e-z- Since v{X k ) ^ ^(X^) for each 1 < k < m and 
for each 1 < k < t we have that v{Y k ) ^ v(Yk), it is easy to see that word v cannot be in £-{v{e2,2,k)) 
for 1 < k < m or in jC(^(e2,3,fc)) fo r 1 < k < t. Moreover, since a x {xh) = 1 = y{X k ) if v is of form 
1100(0 | l) fc - 1 0(0 | l) m_fc 006,'andCT S (a; fc ) = 1 = v(Y k ) if vis of form 1100(0 | l) fc - 1 0(0 | l) m " fc 000, we 
have that v cannot be in C{v{&2ik)) f° r 1 < A; < m. The only remaining possibility is that the word 
v belongs to v{e£). This implies that for some 1 < j < p, it is the case that v(h{l\))-v(h{l\))-v(h{l\)) 
corresponds to the string 000 But we know from the definition of v that that cannot be true, as we 
have assumed that a x ,ag satisfies tp. We conclude that 10i> ^ £(i/(e)), which was to be shown. 

Proof of Theorem 2: 

The first part follows from the first part of Theorem 3. We prove the second part here. We use a 
reduction from Positive 1-3 3-SAT, which is the following NP-hard decision problem: Given a 
conjunction tp of clauses, with exactly three literals each, and in which no negated variable occurs, 
is there a truth assignment to the variables so that each clause has exactly one true variable? 

The reduction is as follows. Let tp = C\ A • • • A C m be a formula in CNF, where each d 
(1 < i < in) is a clause consisting of exactly tree positive literals. Let {p±, . . . ,p n } be the variables 
that appear in <p. With each propositional variable pi (1 < i < n) we associate a different variable 
Xi £ V. We show next how to construct, in polynomial time from <p, a parameterized regular 
expression e over alphabet £ = {a, 0, 1} and a word w over the same alphabet, such that there is 
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an assignment to the variables of <p for which each clause has exactly one true variable if and only 
if w G £o(e). 

The parameterized regular expression e is defined as ae\ae2a ■ ■ ■ ae m a, where the regular expres- 
sion ei, for 1 < i < m, is defined as follows: Assume that C, = (pj V pk Vpe), where 1 < j,k,£ < n. 
Then e^ is defined as (xjX^xi \ XjXix^ \ x^XjXi \ x^x^Xj \ xiXjXk \ x^x^Xj). That is, ei is just the union 
of all the possible forms in which the variables in V that correspond to the propositional variables 
that appear in C, can be ordered. Further, the word w is defined as (al00) m a. Clearly, e and w can 
be constructed in polynomial time from <p. Next we show that there is an assignment for variables 
{pi, . . . ,p n } for which each clause has exactly one true variable if and only if w G C^{e). 

Assume first that w G C< > {e). Then there exists a valuation v : {xi, . . . ,x n } — > T, such that 
w G C{u{e)). Thus, it must be the case that the word alOO belongs to u(aei), for each 1 < i < m. 
But this implies that if Cj = (xj Vi^Vx^), then v assigns value 1 to exactly one of the variables in the 
set {xj,Xk,xi} and it assigns value to the other two variables. Let us define now a propositional 
assignment a : {p±, . . . ,p n } — > {0, 1} such that a{pi) = ^(x.j), for each 1 < i < n. It is not hard to 
see then that for each clause Cj, 1 < j < m, a assigns value 1 to exactly one of its propositional 
variables. 

Assume, on the other hand, that there is a propositional assignment a : {pi, . . . ,p n } — > {0, 1} 
that assigns value 1 to exactly one variable in each clause Cj, 1 < i < m. Let us define v as a 
valuation from {x\, . . . , x n } into {0, 1} such that v{xi) = 1 if and only if a{pi) = 1. Clearly then 
100 G C(u(ei)), for each 1 < i < m. Thus, (al00) m a G C(v(e)). We conclude that w G £o(e). 

Proof of Proposition 2: 

For the sake of readability, in this proof we use U - instead of | - for representing the operation of 
union between regular expressions. 

We first consider the O-semantics. Notice that the reduction used in the proof of Theorem 2, to 
show NP-hardness of Membership^ , constructs a regular expression that is of star- height 0. This 
shows NP-hardness of Membership<> for expressions of star height 0. We prove NP-hardness of 
Membershipo for simple expressions here. 

We use a reduction from 3-SAT. Let (p = f\i <i<n {^\ V ^| V h) be a propositional formula in 
3-CNF over variables {p±, . . . ,p m }- That is, each literal £?, for 1 < i < n and 1 < j < 3, is either 
Pk or ^Pk, for 1 < k < m. Next, we show how to construct in polynomial time from ip, a simple 
regular expression e over alphabet £ = {a, b, c, d, 0, 1} and a word w over the same alphabet such 
that ip is satisfiable if and only if w G £o(e). 

The regular expression e is defined as /*, where / := a{f\ \Jg\ U • ■ ■ U f m Ug m )b, and the regular 
expressions fi and gi are defined as follows. 

Intuitively, fi (resp. gi) codifies pi (resp. -^pi) and the clauses in which pi (resp. -^pi) appears. 
Formally, we define fi (1 < i < m) as {{c l U U{i<j<n| Pl = i) or Pi = t) or n = <*} &) ' ^i), where x { is a 
fresh variable in V. In the same way we define gi as ((c' t U(J{i<j< n |-,p i = £\ or ^ Pi = f- or = ^ d j )-Xi), 
where Xi is a fresh variable in V. The variable Xi (resp. Sj) is said to be associated with pi (resp. 
-.pi) in e. 

Clearly, e is a simple regular expression and can be constructed in polynomial time from (p. 

The word w is defined as: 

aclb acOb acclb accOb ■ ■ ■ ac m lb ac m 0b adlb addlb ■ ■ ■ ad n lb. 
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Clearly, w can be constructed in polynomial time from (p. Next we show that p is satisfiable if and 
only if w G £<>(e). 

Assume first that w G Co(e). That is, there is a valuation v for the variables {xi, x\, . . . , x m ,x m } 
over E such that w G C(i>(e)). But then, given the form of w, it is clear that ac l lb and ac J 06 belong 
to C{v{f)), for each 1 < % < m. Notice that the only way for this happen is that both v{x,i) 
and v(xi) take its value in the set {0, 1}, and, further, v{xi) ^ v{xi). For the same reasons, 
ad^lb G C(v(f)), for each 1 < j < n. But the only way for this to happen is that for each 1 < j < n 
it is the case that either the variable associated with (^ or with £? or with $ in e is assigned value 
1 by v. Thus, the propositional assignment a : {pi, . . . ,p m } — > {0, 1}, defined as a (pi) = 1 if and 
only if v{xi) = 1, is well-defined and satisfies ip. 

Assume, on the other hand, that there is a satisfying propositional assignment a : 
{pi,. . . ,p m } — > {0,1} for p. Consider the following valuation v : {x\,x\, . . . ,x m ,x m } for e: 
v{x,i) = cr(pi) and v{xi) = 1 — a{xi). Using essentially the same techniques than in the previ- 
ous paragraph it is possible to show that w G £(u(e)), and, therefore, that w G £o(e). 

Next we show that Membership<> can be solved in time 0(mn ■ log n) for simple expressions 
of star-height 0. Given a regular expression e G REG(E, V) that is simple and of star-height 0, one 
can construct in time 0(n-log n) [13] an e-free NFA A over SUV that accepts precisely C(e), and 
satisfies the following two properties: (1) Its underlying directed graph is simple and acyclic (this is 
because e does not mention the Kleene star), and (2) for each x G V that is mentioned in e there is 
at most one pair (q, q') of states of A such that A contains a transition from q to q' labeled x (this 
is because e is simple). From Lemma 1, checking whether w G C<>(e), for a given word w G £*, is 
equivalent to checking whether w G C(v(A)), for some valuation v for A. We show how the latter 
can be done in polynomial time. 

First, construct in time 0(m) a DFA B over E such that C(B) = {w}. We assume without 
loss of generality that the set Q of states of A is disjoint from the set P of states of B. Next we 
construct, the following NFA A' over alphabet E U (V x E) as follows: The set of states of A 1 is 
Q x P. The initial state of A' is the pair (qo,po), where qo is the initial state of A and po is the 
initial state of B. The final states of A' are precisely the pairs (q,p) G Q x P such that q is a final 
state of A and p is a final state of B. Finally, there is a transition in A' from state (q,p) to state 
(q',p') labeled a G S if and only there is a transition in ^4 from q to (/' labeled a and there is a 
transition in B from p to p' labeled a. There is a transition in A' from state (q,p) to state (q',p') 
labeled (x,a) G V x E if and only there is a transition in A from q to q' labeled x and there is a 
transition in B from p to p' labeled a. Clearly, such construction can be performed by checking all 
combinations of transitions of both A and B, and thus it can be performed in time 0(mn ■ log n). 
Checking whether C(A') ^ can easily be done in linear time w.r.t. the size of A', thus obtaining 
the 0(mn ■ log n) bound. We prove next that checking this is equivalent to checking whether 
w G L(i/(A)), for some valuation v for A, which finishes the proof of the proposition in terms of 
the O-semantics. 

Assume first that C(A') / 0. Let (q Q ,p ) ^h (qi,p{) ^h ■ ■ ■ -^ (q n -i,p n -i) ^> (q n ,Pn) 
be an accepting run of A'. That is, U1U2 • • • u n G (S U (V x E))* and (q n ,Pn) is a final state 
of A' . Since the underlying directed graph of A is acyclic, and each variable x mentioned in e 
appears in at most one transition of A, it must be the case that for each 1 < i < j ' < n, if 
Ui = ( x i,a>i) G V x E and Uj = (xj,a,j) G V x E then Xi ^ Xj. This implies that we can define a 
mapping v : W — > E, where W is the set of variables used in transitions of A, such that v(x) = a, 
if Ui = (x,a) for some 1 < i < n, and v(x) is an arbitrary element a' G E, otherwise. It is 
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not hard to see that qo — i> q\ — h ■ ■ • > q n -\ — ->• t/ n is also an accepting run of £(z/(.A)) 

and that a±a2 ■ ■ ■ a n = w. The latter can be proved as follows: Let / : {u\, . . . ,u n } — > E be the 
mapping such that f(v,i) = Ui, if U{ = a G E, and /(uj) = a, if itj = (x,a) G V x E. Then 

clearly po > p\ > ■ ■ ■ — — > p n -\ — > Pn is an accepting run of B, and, therefore, 

w = /(tii) • • • f(u n ). Further, let g : {ui, . . . , u n } — > E be the mapping such that g(ui) = Ui, if 
Ui = a G E, and g(uj) = v(x) = a, if Ui = (x,a) G V x E. Then clearly f{uj) = g(ui), for each 

1 < i < n, and, further, qo > q\ > • • • — — > q n -\ —t q n is an accepting run of C(u(A)). 

We conclude that w G C{v(A)). 

Assume, on the other hand, that w G C(u(A)), for some valuation v for A. Suppose that 
w = 01O2 • • • a n , where each on G E (1 < i < n), and let qo — > q\ — > ■ ■ ■ > q n -\ — > q n be an 

accepting run of C(u(A)); i.e. q n is a final state of A. Assume that i\ < 12 < • • • < i m are the only 
indexes in the set {0, 1, ... ,n — 1} such that, for each 1 < j < m, there is no transition labeled 
aj.+i from qi. to qi +1 in A. Then there must be a transition in A from qi. to (fe-fi labeled X{. G V. 

Consider an arbitrary accepting run po — ^ Pi — > • ' ' > p n -i -^ Pn of B; i.e. p n is a final state 

of B. Then it is clear that 

(qo, Po) ^ (qi,Pi) ■ ■ ■ (qii,Ph) — H+1 > (qh+i,Pii+i) ■ ■ ■ 

(qimiPim) X%m ' aim+1 > (q im+1 ,p im+ i) ■ ■ ■ (g„_i,p„_i) -4 a n (q n ,p n ) 
is an accepting run of .4/. Thus, £(«4.') ^ 0. 

Now we deal with the D-semantics. That MembershiPq is coNP-hard, even over the class 
of expressions of star-height 0, follows from Theorem 3. Next we prove that Membership^ is 
coNP-hard, even over the class of simple regular expressions. 

We use a reduction from 3-SAT to the complement of MembershiPd over the class of simple 
expressions. Let ip = /\i <i<n (ij V £j V if) be a propositional formula in 3-CNF over variables 

{pi,...,p m }. That is, each literal £j, for 1 < i < n and 1 < j < 3, is either pk or ~^pk, for 
1 < k < m. Next, we show how to construct in polynomial time from (p, a simple regular expression 
e over alphabet E = {a, b, 0, 1} and a word w over the same alphabet such that <p is satisfiable if 
and only if w Ca (e) . 

We start by defining the word w as follows: 

w := Illlolllllo61110allll0o6111111alllllllo6111110allllll0o6 ••• 

l 2i+1 la l 2i+2 la b l 2i+1 0a l 2i+2 0a b ■ ■ ■ l 2m+1 la l 2m+2 la b l 2m+1 0a l 2m+2 0a b 

l3 (m+l)+l 0a l3 (m+l)+2 0a l3 ( m +l)+3 0a ft ^^+1)+!^ ^(m+l)^^ ^(m+^+S^ ft . . . 
. . . 1 3i(m+l)+l 0a 1 3j(m+l)+2 0a l3 i(m+l)+3 0a ft . . . 

1 3n( m +l)+l 0al 3n(m+l)+2 0al 3n( m +l)+3 0a66aa> 

We denote by it/ the prefix of u; such that w = w'aa and by w" the prefix of w such that w; = w'baa. 
Clearly, w can be constructed in polynomial time from tp. Next we show that (p is satisfiable if and 
only if w £n(e). 

The regular expression e is defined as (E*6 U e)/(6E* U e), where / is defined as: ((/i U g\ U 
• • • fm U 9m){cL U e)) . Intuitively /j (resp. #j) codifies p« (resp. — >pi) and the clauses in which pi 
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(resp. ->pi) appears. Formally, we define /j (1 < i < m) as 

(({«/} U {w"} U l 2i+1 U 

J jSiKll+ly J I 3i(m+1)+2 U J l 3 ^ w + 1 )+ 3 )-x i a), 

{l<j<n|p, ; = ej} {l<j<n\ P i = (.]} {l<j<n\ Pt = tfi 

where Xi is a fresh variable in V. In the same way we define gi as 



(({«/} U {«/'} U l 2i+2 U 






1 j l3 j(m+l)+l u 1 j 


1 3i(m+l)+2 u 1 j 


lW«+V+*).x ia ), 


{i<?<"hft = «}} {i<i<"hp« : 


= £2} {l<j<nh Pi - 


= /?} 



where xi is a fresh variable in V. The variable Xj (resp. xi) is said to be associated with p^ (resp. 
-■Pi) in e. Clearly, e is a simple regular expression and can be constructed in polynomial time from 

We prove first that if w G - £n(e) then (p is satisfiable. Assume that w G - £p(e). Then there exists 
a valuation v : \x\,x\, . . . , x m , x m } — > £ such that w; ^ C{v{e)). First of all, we prove that for each 
1 < i < m both v{xi) and v(x~i) belong to the set {0, 1}. Assume, for the sake of contradiction, 
that this is not the case. Suppose first that v(x\) = a, for some 1 < i < m. Then it is clear that 
C(w'aa) C C(y(e)) (because C(w' v(xi)a) C C(y(e))). But w = w"aa, and, therefore, w G £(v(e)), 
which is a contradiction. Suppose now that v{xi) = b, for some 1 < i < m. Then, again, it is clear 
that C{w"baa) C C{v(e)) (because C{w"v{xi)aa) C C{v(e))). As in the previous case, w = w"baa, 
and, therefore, w G £(v(e)), which is a contradiction. The other case, when a{xi) G {a, &}, for 
some 1 < i < m, is completely analogous. 

Next we prove that for each 1 < i < m it is the case that a{xi) = 1 — a(xi). Assume otherwise. 
Then for some 1 < i < m it is the case that a(xi) = a{xi). Suppose first that a{xi) = a{xi) = 1. 
Consider the unique prefix w\ of w that is of the form ul 2l+1 lal 2l+2 la, for u G X*. Then w is of the 
form W1W2, where w% G bY**. Since w G - C{v(e)), it must be the case that w £((£*& U e)v(f)bT,*). 
It follows that wi £((£*& U e)v(f)). But since t^i is of the form ■ul 2i+1 lal 2i+2 la, it follows that 
u = e or u = u'b, for some v! G £*. In any case it must hold that l 2i+1 lal 2i+2 la G - C(v(f)). Notice, 
however, that £.(l 2i+1 u(xi)al 2i+2 u(xi)a) C C(u(f)). Hence, l 2i+1 lal 2i+2 la G £(i/(/)), which is a 
contradiction. Suppose, on the other hand, that <t(xj) = o-(x~i) = 0. Consider the unique prefix 
w\ of w that is of the form ul 2i+1 0al 2t+2 0a, for u G £*. Then u; is of the form W1W2, where 
w 2 G feS*. Since to C(v(e)), it must be the case that w C((T,*b U e)v(f)bT,*). It follows that 
u>i £((S*6Ue)z^(/)). But since w\ is of the form ul 2j+1 0al 2H ~ 2 0a, it follows that u = e or u = u'b, 
for some v! G S*. In any case it must hold that l 24+1 0al 2 * +2 0a C{v{f)). Notice, however, that 
C(l 2i+l u(xi)al 2i+2 v(xi)a) C C{y(f)). Hence, l 2i+1 0al 2i+2 0a € C(u(f)), which is a contradiction. 

We can then define a propositional assignment a : {pi, . . . ,p m } —> {0, 1} such that a(pi) = v(xi), 
for each 1 < i < m. Notice, from our previous remarks, that cr(-ipj) = 1 — v{x\) = v{xi), for each 
1 < i < m. We prove next that a satisfies ip. Assume this is not the case. Then for some 1 < j < n it 
is the case that a(£}j) = cr(£ 2 ) = o-($) = 0. Consider now the unique prefix w\ ofw such that w\ is of 
the form n61 3j '( m+1 ) +1 0al 3 ^ m+1 ) +2 0al 3 ^ m+1 ) +3 0a, for u G S*. Then w is of the form wxw 2 , where 
W2 G 6S*. Since w ^ >C(^(e)), it must be the case that w C(T,*bu(f)bT,*). It follows that u;i g - 
£(E*6i/(/)). But since ioi is of the form u61 3 J( m + 1 )+ 1 0al 3 ^ m+1 ) +2 0al 3 ^ m+1 ) +3 0a, it follows that 
1 3j(m.+i)+i 0al 3j( m +i)+2 0al 3j( m +i)+3 0a ^ C(u(f)). Let gi, g2 and q 3 be the variables in e associated 
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with t-, £ 2 and £p respectively. Then it cannot be the case that v{q\) = v{q2) = ^(93) = 0. Assume 
otherwise. It is clear that C(l 3 ^ m+ ^ +1 u(q 1 )al 3 ^ m+ ^ +2 u(q 2 )al 3 ^ m+ ^ +3 ^(q 3 )a) C C(u(f), and, 
therefore, i3i(m+l)+i 0al 3j(m+i)+2 0al 3j(m+l)+3 0a e £(u(f)), which is a contradiction. Thus, either 
v{q\) = <7(^}) = 1 or 1/(^2) = cr{$) = 1 or v(q 3 ) = °"(^?) = 1- This is our desired contradiction. 

We prove second that if (p is satisfiable then w G" £n(e). Assume that ip is satisfiable. Then 
there exists a propositional assignment a : {pi, • • • ,£> m } — > {0,1} that satisfies y?. We define a 
valuation 1/ : {xi, x\, . . . , x m , x m } — > {0, 1} for e as follows: For each 1 < i < m it is the case that 
v(xi) = cr{jpi) and v(xj) = 1 — c(pi). We prove next that w $ C(v(e)). 

Clearly, w G - C(u(e)) if and only if for each words wi,W2,W3 G E* such that w = wiW2W 3 it is 
the case that w\ £(E*6 U e) or u>2 G - C{v{f)) or W3 £(6E* U e). Thus, in order to prove that 
w $. C{y(e)) it is enough to prove that for each words w\, W2,w 3 G E* such that w = wiW2W 3 , 

(*) if wi G £{Z*b U e) and w 3 G £(6E* U e) then w 2 £(i/(/)). 

Take arbitrary words ioi , W2 , ^3 G E* such that w; = u;iw;2W>3. We consider several cases: 

1. Either ioi G" £(E*6 U e) or w 3 <£ £(6E* U e). Then (*) is trivially true. 

2. It is the case that w\ G £(E*6 Ue),w 3 e £(6E* U e), and w 2 is of the form l 2i+1 lal 2i+2 lau, 
for some 1 < i < m and uES*. Assume, for the sake of contradiction, that W2 G C(v(f)). 
Since clearly there is no word accepted by £{v{f)) with prefix baa, it must be the case that 
w 3 is not the empty word, and, therefore, that w 3 G C(T,*b). Thus, the only possibility for 
w 2 to belong to £(>(/)) is that l 2i+1 la € C(v(fi)) and l 2i+2 la G C(v(gi)). But this can only 
happen if v{xj) = 1 and v(xj) = 1, which is our desired contradiction (since v(xi) = l — u(xi)). 

3. It holds that w\ G £(E*6 U e), w 3 G £(6E* U e), and w 2 is of the form l 2i+1 0al 2i+2 0au, for 
some 1 < i < m and u G E*. This case is completely analogous to the previous one. 

4. It is the case that w\ G C(T,*b U e), ^3 G £(&E* U e), and W2 is of the form 
1 3j(m+i)+i 0al 3i(m+i)+2 0al 3i(m+i)+3 0aU) for gome 1 < i < m an d u G E*. Assume, for 

the sake of contradiction, that W2 G C(v(f)). It is easy to see that the only way in which this 
can happen is that v(qi) = v{qi) = K</3) = 0, where gi, <?2 and 53 are the variables in e that 
are associated with l 1 -, £ 2 and $, respectively. Thus, cr(fi) = &(l 2 ) = cr(^) = 0, which is or 
desired contradiction. 

This finishes the proof of the proposition. 

Proof of Theorem 3: 

The hardness is established via a reduction from the complement 3-SAT, that is based on the 
Eg -hardness proof of Proposition 1. 

Let <p := C\ A • • • A C p be an instance of 3-SAT, that uses variables {7/1, . . . , yt}. 

Let w be the word 0011000. From ip we construct in polynomial time a parameterized regular 
expression e over alphabet E = {0, 1} such that <p is not satisfiable if and only if £n(e) contains w. 

Assume that each Cj (1 < j < p) is of form {fi- V £ 2 V £?), where each literal -ft, for 1 < j < p 
and 1 < i < 3, is either a variable in {yi, . . . , j/j}, or its negation. 
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With each prepositional variable yk , 1 < k < t, we associate fresh variables Y\~ and Y}.. Let 



/i be a function that maps each literal £) to the variable Yk, if £) corresponds to yk or to Yk, if 



corresponds to -ig/fc. Expression e is defined as 



e = ei | e 2 ,i | • • • | e 2 ,t, 
where ei = 00ll(h(£\) ■ h{£\) ■ h{£\) | ••• | /i(^) • fc(^) . h(£ 3 p )), and for each 1 < k < t, 

e 2 , k = (nnnooo) | (00r fc y fc 000). 

The proof that w G £n(e) if and only if ip is not satisfiable goes along the same lines as the Sf 
hardness proof of Proposition 1. 

Second, we prove that, for each word w GT,*, the problem Memberships^) can be solved in 
polynomial time (actually, in linear time with respect to the size of the expression). 

In order to do this, we first define a high-level procedure CheckSimpleMemb n , that takes as input 
a simple parameterized regular expression e over X and a finite set W C £*, and checks whether 
there exists an assignment v for e such that no word from W belongs to C{u{e)). Then the answer 
to Membership^ (w) for an expression e is -iCheckSimpleMemb n (e, {w}). 

The procedure CheckSimpleMemb n works recursively on input e and W. For each internal node 
of the parse tree of e it iterates over some sets W\ (or pairs of sets (Wi, W 2 ) respectively), and for 
each such set (or pair) calls itself recursively on the children of the analyzed node. If the returned 
answers in case of one of the sets (or pairs) satisfy a given condition, the call accepts. 

The details of the definition of CheckSimpleMemb n are following: 

1. If e = a, for a E E, then CheckSimpleMemb n (e, W) accepts iff a ^ W. 

2. If e = x, for x £ V, then CheckSimpleMemb n (e, W) accepts iff W does not contain all one-letter 
words. 

3. If e is of the form e\\Je 2 , then CheckSimpleMemb n (e, W) accepts iff CheckSimpleMemb n (ei, W) 
accepts and CheckSimpleMemb n (e2, W) accepts. 

4. If e is of the form e\e 2 , then CheckSimpleMemb n (e, W) accepts iff there exist sets Wi C S*, 
W 2 C S* such that: (1) For each word Wiw 2 £ W either w\ € Wi or w 2 € W 2 , and (2) 
CheckSimpleMemb n (ei, Wi) accepts and (3) CheckSimpleMemb n (e2, W 2 ) accepts. 

5. If e is of the form (ei)*, then CheckSimpleMemb n (e, W) accepts iff there is a set Wi C X* 
such that: (1) For each word wiw 2 ■ ■ ■ Wk € W at least one Wi (1 < i < k) belongs to Wi, and 
(2) CheckSimpleMemb n (ei,Wi) accepts. 

It is good to see, why CheckSimpleMemb n needs to operate on sets of words instead of single 
words. The above procedure may construct non-singleton sets in case of concatenation and Kleene 
star and we cannot analyse their elements separately, because in each case we must judge existence 
of a valuation v, which would simultaneously prevent all possible runs of w on u{e) from being 
accepting. 

Now, we prove that the procedure descibed above is sound and complete; that is, we prove that 
for each simple expression e over X and W C S*, CheckSimpleMemb n accepts input e and W iff 
there exists a valuation v for e such that no word in W belongs to C{v(e)). We do this by induction: 
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1. The basis cases - when e = a, for a G S, or e = x, for x G V - are trivial. 

2. Assume e is of the form e\ U e2- Then there is a valuation 1/ for e such that no word in 
W belongs to £(^(e)) iff there is a valuation v for e such that for each w 6 W we have 
io S" £(V(ei)) and u> £(V(e2)). But since we consider only simple expressions here, the 
latter holds iff there are valuations v\ for e\ and v 2 for e 2 such that (a) no word w G W 
belongs to £(i/i(ei)), and (b) no word w £W belongs to £(^2(62)). By induction hypothesis, 
the latter holds iff Checks impleMemb n (ei, W) accepts and CheckSimpleMemb n (e2, W) accepts, 
which, by definition, is equivalent to the fact that CheckSimpleMemb n (e, W) accepts. 

3. Assume e is of the form e±e 2 - Then there is a valuation v for e such that no word w G W 
belongs to £(^(e)) iff there is a valuation v for e such that for each word w\w 2 G W either 
u?i £(z^(ei)) or ^2 ^ d{y{e.2))- But since we consider only simple expressions here, the 
latter holds iff there are valuations v\ for e± and v 2 for e2 such that for each w\w 2 G W either 
w\ G - £(i/i(ei)) or w 2 g C(f 2 {e 2 )). 

Clearly, the latter holds iff there are valuations u\ for e± and u 2 for e 2 and there are finite 
sets W\,W 2 C S* such that: (1) For each wiw 2 G W either w\ G Wi or «7 2 G W2, and 
(2) no word w\ G Wi belongs to C{vi{e\)), and (3) no word w 2 G W2 belongs to £(^2(62))- 
By induction hypothesis, the latter holds iff there are finite sets Wi, W 2 C S* such that for 
each word ^1^2 G W either wi G Wi or w 2 G VF2, and both CheckSimpleMemb n (ei, Wi) and 
CheckSimpleMemb n (e2, W2) accept. By definition, the latter is equivalent to the fact that 
CheckSimpleMemb n (e, W) accepts. 

4. Assume e is of the form (ei)*. Then there is a valuation v for e such that no word w G W 
belongs to £(z^(e)) iff there is a valuation z/i for ei such that for each w\w 2 . . . Wk £ W some 
^i (1 < i < fe) does not belong to £(i/i(ei)). Clearly, the latter holds iff there is a valuation 
v\ for ei and there is a finite set Wi C X* such that: (1) For each word w G W and each its 
decomposition u> = W1W2 • • • Wk some u>i (1 < i < k) belongs to Wi, and (2) no word from Wi 
belongs to £(^i(ei)). By induction hypothesis, the latter holds iff CheckSimpleMemb n (ei, Wi) 
accepts for some set Wi C S* satisfying condition (1), which, by definition, is equivalent to 
the fact that CheckSimpleMemb n (e,yV) accepts. 

Next we show that there is an implementation of the procedure CheckSimpleMemb n that works 
in 0(|e|) time, if we assume that the input consists of a simple parameterized regular expression e 
and a fixed set of words W. 

First, for technical reasons, we remove all subexpressions e from e obtaining e'. Then the 
implementation works recursively as follows: If e' is of the form a, for a G S, or x G V, or e\ U e 2 , 
then we implement recursively in the same way as it is described in CheckSimpleMemb n . If, on the 
other hand, e' is of the form e\e 2 or (ei)*, then we have to be slightly more careful since we have 
to define how to search for sets Wi and W 2 . We do this as follows: 

1. Assume first that e is of the form eie 2 . Then CheckSimpleMemb n accepts e' and W iff 
there are sets Wi,W 2 C S* such that: (1) If wiw 2 G W, then wi G W\ or w 2 G W 2 , and 
(2) CheckSimpleMemb n (ei, Wi) accepts, and (3) CheckSimpleMemb n (e2, W 2 ) accepts. Our 
implementation, however, does not look over arbitrary sets Wi and W 2 , but only over the 
sets which can be constructed as follows: For each w G W and for each wi,w 2 G S + (that 
is, both wi and w 2 are nonempty) such that w = w\w 2 , either pick up wi and place it in 
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Wi or pick up W2 and place it in VW If for some pair (Wi, W2) constructed in this way it 
is the case that CheckSimpleMemb D (ei,Wi) accepts and CheckSimpleMemb n (e2, W2) accepts, 
then CheckSimpleMemb n (e', W) accepts. The reason why we can restrict ourselves to the 
case of nonempty words is that neither e\ nor e<i equals e in e', and, thus, the empty word 
trivially does not match any of them. Clearly our implementation continues being sound and 
complete. 

2. Assume second that e' is of the form (ei)*. Then Checks impleMemb n accepts iff there is a set 
Wi C X* such that: (1) For each word w G W and each its decomposition w = W1W2 ■ ■ ■ Wk 
some Wi (1 < i < k) belongs to Wi, and (2) CheckSimpleMemb n (ei,Wi) accepts. Again, 
our implementation does not look over arbitrary sets Wi, but only over the sets which can 
be constructed as follows: For each decomposition W1W2 ■ ■ ■ w^ of each word in W pick up 
an arbitrary 1 < % < k and place Wi in Wi- If for some set Wi constructed in this way 
CheckSimpleMemb n (ei, Wi) accepts, then CheckSimpleMemb n (e', W) accepts. Clearly, our 
implementation continues being sound and complete. 

Too estimate the time complexity of the above implementation, first we need to see that all 
elements of all W encountered in the algorithm are subwords of w and thus all W € W, where W is 
the powerset of all subwords of w. Clearly the size of W is dependent only on \w\. Also the number 
of cases tried by each subcall of CheckSimpleMemb n and the number of steps needed to construct 
each Wi (and W2 respectively) is dependent only on \w\. Hence all these values are constant. 

If along the algorithm we memoize the answers to subcalls, then our complexity will be upper- 
bounded by the complexity of a dynamic version of the above algorithm, which would calculate 
CheckSimpleMemb n (ei, Wi) for all subexpressions e\ of e' and all Wi £ W in a bottom-up order. In 
this approach, the computation of CheckSimpleMemb n (ei, Wi) would take constant time, because 
answers to subcalls would have been precomputed. Thus the total complexity of CheckSimpleMemb n 
is linear with respect to the size of the parse tree of e' and thus with respect to e as well. 

We can strive for accelerating the above algorithm, by replacing all subexpressions of the form 
(/1 U • • • U /* U • • • /„)* with (/x U • • • U fi U • • • /„)*, which does not change the semantics of 
the expression, but avoids unnecessary computational effort. It will not lower the asymptotic 
complexity, but will help lower the constant. 

We finally prove that, for each word w € £*, the problem Memberships^) can be solved in 
time 0(nlog n). Obviously, we can concern all labels o S S, which do not occur in w as equal. 
This simple observation makes the alphabet size fixed: |S| < \w\ + 1. Now, let ro bea word over 
S. Next we construct an algorithm that, given a parameterized regular expression e over X, checks 
whether w G Co{e). Using techniques from [13] the algorithm first constructs an NFA A over 
S U W that is equivalent to e, with 0(n) states and 0(n log n) transitions, and then performs a 
nondeterministic logarithmic space algorithm on A. 

Assume that W C Vis the set of variables that appear in e. Construct, in polynomial time, 
an NFA A (with set of states Q) over S U W such that C{A) = C{e). Let us assume, without loss 
of generality, that qo is the unique initial state of A. Further, assume that w = a±a2 ■ ■ ■ a m , where 
each Oj (1 < i < m) is a symbol in S. Then we perform the following nondeterministic algorithm 
over A: The algorithm works in at most in + 1 steps. At each step < i < m the state of the 
algorithm consists of a pair (gj,/i,), where qi G Q and fn is a mapping from some subset Wi of W 
into S. The initial state of the algorithm is (qo, /Uo), where no : — > S (recall that go is the initial 

28 



state of A). Assume that the state of the algorithm in step i < m is ((?i,M«)- Then in step i + 1 
the algorithm nondeterministically picks up a pair (gj + i,//j + i) and checks that at least one of the 
following conditions holds: 

• There exists a transition labeled en G £ from qi to tjj+i in A and /Uj = Mt+ij that is, both /ij 
and /ij + i are mappings from Wj into £, and ^j+i(x) = fJ>%(x), for each cc G Wj. 

• There exists a transition labeled x G V from qi to g^+i in .4, x ^ Wj and /Uj+i : Wj U {x} — > £ 
is defined as follows: //j + i(y) = fJ*i(y), for each y € Wj, and /zj + i(x) = a^. 

• There exists a transition labeled x G V from qi to <7j+i in A, x £ Wj, ^i(x) = a, and /ij = //j+i. 

The procedure accepts if it reaches step n in state (<fo)Mn)j f° r some accepting state q n of A. 
Notice that, since w is fixed, the size of each mapping fi from a subset of W into £ is also fixed: 
|/i| < mindiy |, |W|}. That is because the initial mapping is empty and in each step of the algorithm 
it can grow only by one. This means that the nondeterministic procedure described above works 
in NLogspace. 

It is not hard to prove (esentially using the same techniques than in the second part of the proof 
of Proposition 2) that the procedure described above accepts the parameterized regular expression 
e if and only if w G Co{e). 

Now let M be the set of all mappings [i from subsets of W to E, which can occur in the algorithm 
presented above, and V be the number of all states (q, //), which can occur therein. To see the 
precise time complexity, we need to estimate \M\ and |V|. First, \M\ = O (|X| mm '-l u 'l>l '-•), which 
is fixed in our case. Then, V = 0(n ■ \M\), which is linear in the number of states of A. 

Now let us imagine a directed graph G with the set of vertices V, in which there is an edge 
from state (q,fi) to state (</,//) iff the pair (</,//) can be picked up from pair (<?,//) according to 
the algorithm presented above. Each edge (<?,//) — > (</,//) corresponds to an edge q — > q' in A 
and for each edge q — > q' in A there are at most \M\ edges (q,/J.) — >• (q',(j,') (one for each [i G M). 
Therefore, G has 0(n log n) edges, since \M\ is fixed. We can also construct G in 0(n log n) time 
and space. 

Finally, it suffices to perform a reachability search in G to see whether an accepting state can 
be reached from node (<7oi Mo) i n G, which can clearly be done in linear time with respect to the size 
of G, which gives us an algorithm with 0{n log n) time complexity or, by dropping the assumption 
of w being fixed — O (|u>| • |£| min i-H'l>'' v |} . n \ g n ^ time complexity. Moreover, we can spare space 
by not constructing G and by computing it "on the fly" , because standard graph search algorithms 
run in 0(V) space. It might be also useful in terms of time, because a fixed word w either attains 
an accepting state within a short path or does not do so at all, so usually most part of G would 
not by touched by the search algorithm at all. 

It is worth analysing the gain in performance, which the above method gives in comparison 
to the direct approach. The straightforward algorithm calculates an NFA accepting £o(e) and 
runs reachability search on it. The size of such NFA is 0(|E|l w l), so the time complexity becomes 
0(|u)| • |S[I ' • n log n). Hence, the only gain, that the former algorithm gives is lowering the 
exponent over E from |W| to min{|i(;|, |VV|} and this is because we it takes into advantage a smaller 
class of mappings, confined by the length of the run of w on A. In fact, this gain is very large if 
we speak of problem instances with a relatively small w and a huge e. 
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Proof of Theorem 4 

(Part 1) We begin with the proof for the first part. The lower bound follows directly from the fact 
that checking universality is PsPACE-hard even for complete regular expressions. For the upper 
bound, we need the following easy claim from the definition of certain acceptance: 

Claim 2 Let e be a paramterized regular expression over an alphabet S. Then, £n(e) = X* if and 
only if L{v{e)) = S* , for every valuation v for e. 

Then, a Pspace algorithm to solve the complement of the Universality^ problem (wether 
£a(e) ^ £*) just guesses a valuation v for e, and then checks wether L(v(e)) ^ £*. Since v{e) is 
a complete regular expression, it is well known that this decision procedure can be performed in 
Pspace. This finishes the proof of the first part of the theorem. 

(Part 2) Next, we prove the second part, that is, we show that Universality<> is Expspace- 
complete. We begin with the upper bound. Given a parameterized regular expression e, it is easy to 
see that an equivalent, complete regular expression e' such that £o(e) = C{e') can be constructed 
in exponential time: just take \J V is a va i uation for e v(e) (the number of possible valuations is |£|l w l) 
. Combining this fact with the well known result that there is an algorithm to check wether the 
language of a (complete) regular expression e' is universal that require only polynomial space w.r.t. 
e', we obtain our Expspace algorithm: First obtain a complete regular expression e' such that 
Co{e) = jC(e'), and then decide wether C(e') = £*. 

For the upper bound we present a reduction from the complement of the acceptance problem of 
a Turing machine. Let L be a language that belongs to Expspace, and let A4 be a Turing machine 
that decides L in Expspace. Given an input a = oq, . . . , aj--i, we construct in polynomial time 
with respect to M. and a a parameterized regular expression e_M,a such that Co{eM,a) = X* if and 
only if M. does not accepts a. 

Assume that M. = {Q, S, T, qo, {q m }, S}; that is, the states of M. are Q = {qo, . . . , q m }, the initial 
state is sq, the input alphabet is S, and T is the union of £ plus a number of symbols reserved 
for the Turing machine, and that the set of transitions of A4 is 5. Without loss of generality, we 
assume that M has only one tape, starts with the input copied on the first \a\ cells of this tape, has 
only one final state s m , and that no transition is defined for that state. Moreover, since A4 decides 
L in Expspace, there is a polynomial S() such that, for every input a over E, M. decides a using 
space of order 2 s (\ a \> . Assume for notation convenience that S^lal) = n, and as usual T = S U {B}. 

Let A = {0, 1, $, %, #, &}uru(rxQ). Assume that S = {b x , . . . , b p }. For the sake of readability, 
for a set B = {b\, . . . , b n } of symbols we denote by B the regular expression b p \ ■ ■ ■ \ b p . Thus, 
for example, assume that T = S U {B}. Then, when we write (r U (r x Q)) we represent the 
language given by (6i | • • • | b p \ B | (&i,so) | ■ ■ ■ | (b p ,s m )). Using the alphabet A, we represent 
a configuration of the Turing machine by a word in the language 

#($[o]%(r u (r x q))) ■ ■ ■ ($[2" - i]%(r u (r x q)))& 

Next we construct a parameterized regular expression e_M,a such that Co(e) ^ S* if and only if 
Ai accepts on input a. Define eM,a = e m | e l \ e' \ e trans , where 

• e form describes all the words that do not represent a concatenation of configurations of M.. 

• e l describes words that do not start with the initial configuration of M. over input a. 
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• e^ describes words that do not end in a final configuration for M 

• e trans describes words that contains two consecutive configurations a and j3 such that a and 
/3 do not agree on 5. 

We now describe these expressions. Expression e^ orm is the union of the following expressions, 
describing that: 

• The first symbol of the word is not #: 

ef rm = (6 | (A - {#}))A* 

• The last symbol of the word is not &;: 

4° rm = A*(A-{&}) 

• After a # we do not have the symbol $: 

e form = A *#( A _ {$})A* 

• Between the symbols $ and % there are less than n symbols: 

e form = A * $ ( £ | A )«"1 %A * 

• Between the symbols $ and % there are more than n symbols: 

e form = A * $ ( A _ {%})"+! ( A _ {%})*%A* 

• Between the symbols $ and % there is symbol not in {0, 1}: 

e form = A * $(Q | X) * (A _ | Q) 1? %})A * 

• After a word in (0|l) n we do not have the symbol %: 

e form = A *( Q | jjn( A _ {%}) A * 

• After the symbol % we do not have a symbol in (T U (r X Q)) 

e form = A * %(A _ (r u (r X Q)))A* 

• After a word in [i]%(T U (T x Q)) we do not have the symbol $, for < i < 2 n - 2: 

e form = A * 0(Q | !)n-l %( - r y gi x q^ a _ | $ }) A * 

e form = A * (Q j 1)0(Q | !)n-2 % ( r y ( p x g ))(A _ | $ | )A * 

e form = A * (Q j yn-l Q% ( T y (j, x g ))(A _ | $})A * 
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• After a word in [2 n — l]%(r U (T X Q)) we do not have the symbol &: 

e form = A *l n %(T y (p x g))( A _ { & }) A * 

• After the symbol <fc we do not have the symbol #: 

e form = A * & ( A _ {#})A* 

• Between a symbol # and & there is no symbol in T x Q (a configuration does not have a 
reading position): 

e form = A * #(A _ { & } _ (p x Q))*& A * 

• Between a symbol # and &; there is more than one symbol in T x Q (a configuration features 
two positions being read by the machine): 

e form = A * #(A _ {&})*(r x Q)(A - {&})*(r X Q)(A - {&})*&A* 

• After the word #$ we do not have the word [0]: 

47 = a*#$i(c 

e form = A * #$(Q | 1)1(0 | 1 
e form = A * #$ (Q | l)"-llA* 



form = A * #$1 (Q | !)n-l A * 

form A *_£!£/' n I i\i/'n I i"in-2A* 



After a word $[i]%(T U(fx Q)) we do not follow with [i + 1]%(T U(fx Q)), where i is even: 



form 
'15i 



A*(0 
e[°Z = A*(0 
4° 5 r 2 m 2 = A*(0 
e[°Z = A*(0 
4°5 r 3 m 2 = A *(° 



lr-^r u (r x q))$(o | lr^oA* 
i) n ~ 2 oo%(r u (r x q))$(o | i) n ~ 2 iiA* 
i) n - 2 io%(r u (r x q))$(o | i) n ~ 2 oiA* 
i) n ~ 3 o(o | i)o%(ru(r x q))$(o | i) n ~ 3 i(o | i)ia* 
i) n ~ 3 i(o | i)o%(ru(r x Q))$(o | i) n ~ 3 o(o | i)ia* 



4°™ = a*o(o | i) n " 2 o%(r u (r x q))$i(o | i) n - 2 iA* 

ef 5 r ™ = A*1(0 | l) n ~ 2 0%(r U (r x Q))$0(0 | l) n ~ 2 lA* 
After a word $[i]%(T U (T x Q)) we do not follow with [i + l]%(r U(fx Q)), where i is odd 
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and i < 2 n - 1: 



form 
e 16i 

form 
e 16 2 ,i 

form 
e 16 2 ,2,i 

form 
e 16 2 ,2,2 

form 
e 16 2 ,3,l 

form 

162,3,2 



A*(0 
A*(0 
A*(0 
A*(0 
A*(0 
A*(0 



l)"- 1 ^^ u (r x q))$(o 1 lf-HA* 



n-2 f 



i)"-'oi%(r u (r x q))$(o | i) n -'ooA 



\n—3f 



l)«-a001%(T U (r x Q))$(0 I l) n " d 110A* 

i) n - 3 ioi%(r u (r x q))$(o | i) n ~ 3 oioA* 

i) n - 4 o(o | i)oi%(ru (r x q))$(o | i) n " 4 i(o | i)ioa* 

i) n - 4 i(o | i)oi%(ru (r x q))$(o | i) n " 4 o(o | i)ioa* 



form 
e 16 2 ,n-2,i 

form 
e 16 2 ,„_2, 2 



form 
e 16 3 ,l 

form 
e 16 3 , 2 ,l 

form 
e 16 3 ,2,2 

form 
e 16 3 ,3,l 

form 

163,3,2 



A*o(o I i) n ~ 3 oi%(ru (r x q))$i(o | i) n - 3 ioA* 
a*i(o I i) n ~ 3 oi%(ru (r x q))$o(o | i) n_3 ioA* 



A*(0 
A*(0 
A*(0 
A*(0 
A*(0 



i) n ~ 3 oii%(r u (r x Q))$(o I i) n ~ 3 (ooo | no | oio)A* 
i) n - 4 oon%(r u (r x q))$(o | i) n ~ 3 nooA* 
i) n - 4 ion%(r u (r x q))$(o l i) n ~ 4 oiooA* 
i) n - 5 o(o 1 i)on%(ru(r x q))$(o | i) n " 5 i(o | i)iooa* 
i) n - 5 i(o 1 i)on%(ru(r x q))$(o | i) n ~ 5 o(o | i)iooa* 



3 form 
'16 3 ,„- 3 ,i 

= form 
'173,n-3,2 



a*o(o I i) n " 4 oii%(r u (r x q))$i(o | i) n " 4 iooA* 
a*i(o 1 i) n ~ 4 on%(r u (r x Q))$o(o | i) n - 4 iooA* 



form 
e 16„,i 



A^r-'YoiT U (r x Q))$({(0 J l) n } - {l n })A* 



Notice that expression e m is of polynomial size. In particular, the language ({(0 | l) n } 
{l n }) can be described with the expression 



0(0 I 1 



,n-l 



(0 I 1)0(0 I 1 



,71 — 3 



(0 1 ly^o 



Next, expression e l is the union of the following expressions, describing that: 

• The first configuration does not contain the initial state: 

e\ = #($(0 I l)"%(r U (T x (Q - c/ ))))*&A* 

• The first configuration does not contain the head in the initial position: 

4 = #$o n %rA* 
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• The first configuration does not have the word a in its first \a\ = k symbols: 

4 ;1 = #$O n %(rxQ-{( (7o ,a })A* 

4 >fe = #$[o]%(r u (r x Q))$[i]%(r u(TxQ))- $[* - i]%(r x q - {(a k ^})A* 

• The rest of the symbols of the first configuration are not blank symbols: 

4 = #$[o]%(r u (r x Q))$[i]%(r u (r x Q)) • • • $[fe - i]%(A - {&})*(r u (r x q) - {b})a* 

Furthermore, expression ef describes words whose final configuration does not end in a final 
state: 

e f = A*#(A - &)*((r x Q) - {(a, s m ) \ a € T})(A - &)*& 

Finally, expression e trans is the union of the following expressions, describing that: 

• A cell not pointed by the head changed it's content: 



trans 
e l 



[J A*#($(o|i) n %(r u (r x Q)))*$xi • • • x n %a($(o|i) n %(r u (r x q)))*& 

#($(0|l) n %(ru(rxQ)))*$X! • • • x n %(r-{a}U(rxQ)-({a}xQ))($(0|l) n %(ru(rxQ)))*&A* 

• A configuration that is not final features a pair in Q x £ for which no transition is defined 
(the last # states the configuration is not final): 

e^ 118 = \J A*#($(0|l) n %(r u (r x Q)))* 

aST,sSQ\8(s,a) is not defined 

$(0|l) n %(a, s)($(0|l) n %(r U (r x Q)))*&#A* 

• The change of state does not agree with 5: 

eT as = [J A*#(A - {&})*( S ,a)(A - {&})*& 

Aer,seQ|5(s,a)=(s',a',{^,^}) 

#(A - {&})*(r x (Q - { S '}))(A - {&})*&A* 

• The symbol written in a given step does not agree with 5: 



e t r ans = J 

aer,seQ|(5(s,a)=(s',a',-i>) 

A*#($(o|i) n %(r u (r x Q)))*$ yi , . . . , y n %(a, s )($(o|i) n %(r u (r x q)))*& 
#($(o|i)"%(r u (r x q)))*$i/!, . . . , y n %(r - { a '})($(o|i) n %(r u (r x q)))*&a* 
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The movement of the head does not agree with 5: 

4X S = |J 

aer>£Q|5(s,a)=(s',a',->) 

A*#($(o|i) n %(r u (r x Q)))*$x x ,. . .,x n %(a, s)($(o|i) n %(r u (r x q)))*& 
#($(o|i) n %(r u (r x Q)))*$x x , . . . , x n %(a')e\ ($(o|i) n %r($(o|i) n %(r u (r x q)))*)&a* 



aeT ,seQ\S(s,a)=(s' ,a' ,<-) 

A*#($(o|i) n %(r u (r x Q)))*$xi, . . . , x„%(a, s)($(o|i) n %(r u (r x Q)))*k 
#e|(($(o|i) n %(r u (r x Q)))*$(o|i) n %r)$x!, . . . ,x n %(a / )(S(o|i) n %(r u (r x q)))*&a* 

It is now straightforward to show that £o(e.M,a) = ^* if and only if M. does not accept on 
input a. This finishes the proof of the Expspace lower bound. □ 

Proof of Proposition 3: 

We show explain how to adapt the reduction of Theorem 4 so that it does not longer uses param- 
eterized expressions that are not simple. 

Recall that the previous reduction used alphabet A = {0, 1, $, %, #, &} uru(rxQ). 

For the simple case, we need a slightly bigger alphabet. Let A = {0, 1, $, %, # e , & e , #°, &°} U 
r U (r x Q). The idea is to modify the way configurations are represented. Previously, we had that 
each configuration was represented by a word in 

#($[o]%(r u (r x q))) • • • ($[2 n - i]%(r u (r x q)))&. 

In the modified reduction, however, configurations can be represented by either one of this 
expressions: 

# e ($[o]%(r u (r x q))) • • • ($[2 n - i]%(r u (r x g)))&°, or 
# e ($[o]%(r u (r x q))) • • • ($[2 n - i]%(r u (r x Q)))&°. 

The intuition is that configurations using j^ e and & e represent an even stop of the computation 
of the Turing machine, whereas configurations using ^° and &° represent an odd step. 

It is now straightforward to modify expressions e , e* and e* to work under this codification 
of configurations: We just have to make sure that 

• e form describes all the words that do not represent a concatenation of configurations of A4, 
where now valid configurations will have the form # e u&z e #°u' &z°# e u" &z e • • • , that is, valid 
configurations start in an even step, and then follow an even - odd - even - odd - ... pattern. 

• e l describes words that do not start with the initial configuration of A4 over input a. 

• e-' describes words that do not end in a final configuration for M. 
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The descriptions of this expressions is omitted, since their extension is straightforward. Next, 
we show how one of the expressions in e trans is to be modified, the other remaining being analogous. 

Consider expression e^ rans , that intuitively accepts all words describing two configurations in 
which a cell not pointed by the head changed it's content. It was defined previously as 



trans 



U A*#($(o|i) n %(r u (r x Q)))*$xi • • • x n %a($(o|i) n %(r u (r x q)))*& 

agr 
#($(0|l) n %(ru (r x Q)))*$xi • • • x n %(T-{a}U(T x Q) - ({a} x Q))($(0|l) n %(ru(r x Q)))*&A* 

Then, we redefine it as e^ rans | e^ rans , where 

e trans = J A*# e ($(0|l) n %(r U (r X Q)))*$ 

aer 

xx — x n (%a($(o|i) n %(r u (r x Q)))*& e #°($(o|i) n %(r u (r x Q)))*%) \ 

(%(r - {a} U (T x Q) - ({a} x Q))($(0|l) n %(r U (r x QW&°)) (# e A* + e) 



-l,o — 

aer 



[J A*#°($(o|i) n %(r u (r x Q)))*% 
xx — x n (%a($(o|i) n %(r u (r x Q)))*&°# e ($(o|i) n %(r u (r x q)))*$) | 

(%(r - {a} U (T x Q) - ({a} x Q))($(0|l) n %(r U (r x QW& e )) (#°A* + e) 



And such that every appearance of xi, . . . , x n represents a new set of n fresh variables. Notice 
then that these expressions are simple parameterized regular expressions. In order to see that the 
intended meaning of these expressions remains untouched, consider expressions 



= /trans 



e ._„ = j A*# e ($(o|i) n %(r u (r x Q)))*%xx ■ ■ ■ x n %a($(o|i) n %(r u (r x Q)))*k e 

aer 
#°($(0|l) n %(ru(rxQ)))*$x 1 ---x n %(r-{a}U(rxQ)-({a}xQ))($(0|l) n %(ru(rxQ)))*& o A* 



= /trans 

'1,0 



e .„_ = j A*#°($(o|i) n %(r u (r x QW$xx ■ ■ ■ x n %a($(o|i) n %(r u (r x Q)))*&° 

aer 
# e ($(0|l) n %(ru(rxQ)))*$X!---x n %(r-{a}U(rxQ)-({a}xQ))($(0|l) n %(ru(rxQ)))*& e A* 

It is now easy to see that ^(e'^ 118 ) C £(e^ ans ) and ^(e^ 118 ) C £(e\™ ns ). Moreover, it is easy 
to check that none of the words in jC(e^ rans ) but not in /3(e' 1 tr e ans ) represent a valid sequence of 
configurations, and neither does any word in £(e^ rans ) but not in C{e'i^ ns ). Using this observation, 
it is possible to modify all expressions in £(e ans ) so that they are simple. We have omitted the 
rest of the proof since it goes along the same lines as the reduction of Theorem 4 
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Proof of Proposition 4: 

(Part 1) It is well known that Containment^ (ei, •) is PsPACE-hard even for complete regular 
expressions. For the upper bound, let e[ be an expression such that C(e[) = £n(ei). Notice that, 
since e\ is fixed and by Proposition 5, expression e^ can be computed in constant time. Then, it 
suffices to guess a valuation v and a word w such that w € C(e'i), but w £ £(z/(e2)), which can 
clearly be done in Pspace. 

(Part 2) We begin with the upper bound for the problem ContainmenTd(-, e<i). Assume that 
the input is a parameterized regular expression e±, using variables in W, and let E be the alphabet 
of e\. The coNP algorithm is as follows. First, construct a DFA A e2 such that C(A e2 ) = £0(^2)1 
and then construct A^ , the automaton that accepts the complement of C(A e2 ) (since e2 is fixed, 
and by Proposition 5, this construction can be done in constant time). Next, guess a valuation v 
from W to E, and, from u{e{), construct an automaton A v i ex \ such that A v i ex \ = £(v(e\)) (It is 
a standard observation that this automaton can be constructed in polynomial time from v(e\)). 
Finally, check that A v i ei \ n A^ 2 7^ 0, which can be performed in polynomial time using a standard 
reachability test over the product of A v i ei \ and A^ 2 . Let us show that this algorithm is sound and 
complete. If the intersection A v ( ei \ fl A^ 2 / 0, then there is a word w € C{v{e\)), and thus in 
£o(ei), that does not belong to £0(^2), or, in other words, that £<>(ei) is not contained in £<>(e2). 
On the other hand, it is clear that if A v ( ei ) H A^ 2 = for all possible valuations v from W to E, 
then £<>(ei) is contained in £o(e2) 

The hardness is established via a reduction from 3-SAT to the complement of 
Containment^ (•, e<i)- Let e2 be the following regular expression over alphabet E = {0, 1,#}: 

e 2 = ((10 I 01)*#((0 I 1) 3 )*000((0 I l) 3 )*) I (((0 I 1) 2 )*(00 | 11)E*) | (E*#E*#E*), 

and let <p = Akkti^W^^) be a propositional formula in 3-CNF over variables {p\, . . . ,p m }. 
That is, each literal £?, for 1 < i < n and 1 < j < 3, is either p^ or ^Pk, for 1 < k < m. Next 
we show how to construct in polynomial time from 99 a parameterized regular expression e\ over 
alphabet E = {0, 1, #} such that 93 is satisfiable if and only if £<>(ei) 2 £( e 2)- 

Let W = {xi,X{ I 1 < i < m}. Intuitively, each Xi represents the value assigned to pi, and x% 
represents the value of ~^pi. Moreover, assume that h is a mapping from the literals i\ (1 < % < n 
and 1 < j < 3) to W, defined as expected: h(t\) = Xk if t\ is p^, for some 1 < k < m, and 
h(4) = x k iU{ is ^ Pk . 

Define e\ as follows: 



ei 



xixi • • • x m x m #h(l\)h(lj)h(£l) ■ ■ ■ h{£i)h{il)h{il 



We show that ip is satisfiable if and only if £<>(ei) % C*{&2)- 

(=>): Assume that (p is satisfiable by valuation a. Let v be a valuation from W to E, defined 
as follows: 

• For each 1 < k < m, v(x k ) = 1 if cr(pk) = 1, and z^(xfc) = otherwise. 

• For each 1 < k < m, v{xk) = if <j{pk) = 1, and v{xk) = 1 otherwise. 

Notice that L(i>{e\)) contains a single word. We shall abuse the notation and denote with v{e\) 
both this word and the aforementioned expression. It is clear that v{e\) contains a single symbol 
#, and starts with a prefix in (01 | 10)*#. Thus, if £<>(ei) C £(e2) it must be that v{e\) is denoted 
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by the expression (10 | 01)*#((0 | 1) 3 )*000((0 | l) 3 )*. But this implies that there are literals el 



I ! 



and if, for some 1 < i < n, such that v assigns the word 000 to h(ij)h(if)h(if). By construction 
of v, it must then be that a falsifies the i-th clause of <p, which contradicts the fact that a is a 
satisfying assignment. 

(<=)'■ Assume now that £o(ei) % £(^2)- By the definition of the O-semantics, there is at least 
one valuation v from W to £ such that £{v(e\)) % £(e2). Notice again that, by the construction 
of ei, v(ei) contains a single word. Again, we shall denote this word also by v(e\). Then if 
C{v{e\)) % H{e-i) it must be that v(e\) is not in tL{e<i). This immediately entails that v(e{) cannot 
have two or more copies of the symbol #, and thus we conclude that u assigns to each variable 
W a symbol in {0, 1}. From the above observation, notice that the following valuation a for the 
variables in ip is well defined: 

• a (Pi) = 1 if v(xi) = 1, and a(pi) = if v(xi) = 

Next, we show for all 1 < i < m, it is the case that v{xi) ^ v(xi). Assume for the sake of 
contradiction that for some i < i < n, we have that v(xj) = v{xi). From the construction of e±, 
we then have that v{e\) is denoted by the expression ((0 | 1) 2 )*(00 | 11)£*, which contradicts the 
fact that v(e\) is not in L{e-i). Finally, we claim that (p is satisfiable by valuation a. Assume the 
contrary. Then there is a clause of form (ij V if V if), for some 1 < i < n, such that, for each 
1 < j < 3, if i\ is the literal pk, for some 1 < k < m, then a assigns the value to pk, and if i\ is 
the literal ~<Pk, for some 1 < k < m, then a assigns the value 1 to Pk- It is now straightforward to 
conclude that this fact contradicts the assumption that y(e±) is not in £(e2), by studying all of the 
8 possible cases. 

Proof of Proposition 5 

Let e be a parameterized regular expression over alphabet S, using variables in W. Through this 
proof we heavily rely on the fact that there are |E|I W I possible valuations v : VV — > £ for e. 

First we show how to construct in double exponential time an NFA An such that £n(e) = 
C(An)- For each valuation v : W — > E, we denote by A nu ^ the NFA such that C(A v u\))C(e) (This 
can be performed in Ptime by doing any standard regular expression to automata translation). 
Then, notice that £n(e) = fVyv->-E ^(^( e ))' an< ^ ^ us we can J us t take the product of them. Given 
that there exists |E|I W I possible valuations from W to E, and that each A v i e \ can be constructed 
in time 0(|e|log |e|) [13], the NFA for the product Ili/-vv->s ^(e) can ^ e constructed in time 
| e |0(|S|IWI)_ 

Next, in order to construct an automaton Ao such that £<>(e) = C(Adm), one computes an NFA 
that represents ("Vw-s-E ^vie) by combining all the automata with a nondeterministic choice at the 
beginning. Given that there exists |E|' possible valuations from W to E, and that each A v t e \ can 
be constructed in time 0(|e|log 2 |e|) [13], we have that the automaton Ao can be constructed in 
time (El ^!) • |e| • log 2 |e|. 

Proof of Theorem 6 

(Part 1) We begin with the double exponential bound for Ca- For each n € N, let e n be the 
following parameterized regular expression over alphabet E = {0, 1} and variables x\, . . . ,x n : 

e n = ((o i ir +i r ■ xi ■ ■ ■ x n ■ Xn+1 . ((o i ir +i r . 
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Notice that each e n uses n + 1 variables, and is of linear size in n. In order to show that every 
NFA deciding Co{e n ) has 2 2 ™ states, we use the following result from [10]: if L C X* is a regular 
language, and there exists a set of pairs P = {(ui,Vi) | 1 < i < m} C X* x X* such that 

1. Ujf j € L, for every 1 < i < m, 

2. UjVi $l L, for every 1 < i, j < m and i ^ j, 

then every NFA accepting L has at least m states. 

Given a collection £ of words over {0, 1}, let ws denote the concatenation, in lexicographical 
order, of all the words that belong to £, and let w§ n denote the concatenation of all words in 
{0, l} n+1 that are not in £. 

Then, define a set of pairs P n = {(ws,w§ n ) j £ C {0, l} n+1 and |£| = 2 n }. Since there are 

2 n+1 binary words of length n+ 1, the are ( 2 n ) different subsets of {0, 1}" +1 of size 2 n , and thus 

F n contains [ 2n ) > 2 pairs. 

Next, we show that £n(e n ) and P n satisfy properties (1) and (2) above, which proves the double 
exponential lower bound. 

1. We need to show that for every set S C {0, l} n+1 of size 2 n , the word ws,w§ n belongs to 
^(^(e n )), for every possible valuation v : X — > {xi, . . .x n+ \}. Let then S be an arbitrary 
subset of {0, l} n+1 of size 2 n , and let v be an arbitrary valuation from X to {xi, . . . , x n+ i\. 
Define u = v{x\) ■ ■ ■ v{x n+ i). Then u is a substring of either ws or w§ n . Assume the former is 
true (the other case is analogous). Then the word ws, w§ n can be decomposed in v ■ u ■ v' ■ w$ n , 
with v'.v" € £((0 | l) n+1 ). This shows that ws,w§ n belongs to C{v(e n )). 

2. Assume for the sake of contradiction that there are distinct subsets S\, £2 of {0, l} n+1 of size 
2 n such that ws 1 Wg 2n belongs to £n(e n ). Since £1 and £2 are distinct, proper subsets of 
{0, l} n+1 (they are of size 2 n ), there must be a word in {0, l} n+1 that belongs to £2 but not 
to £1. Let s be such word. Moreover, let v be a valuation from X to {xi, . . . , x n+ i\ such that 
v{x\) ■ ■ ■ v(x n +i) = s. It is straightforward to show the following: 

Claim 3 Let u £ {0, l} n+1 be a word of size n + 1. Then u is a subword of every word w G 
£n(e n ). Moreover, there is a match for u in w that starts in a position j of w (1 < j < \w\), 
and such that j = 1 mod n + 1. 

Since we have assumed that the word ws^^ n belongs to £ n (e„), by the above claim we 
have that s must be a subword of ws x w§ n that matches ws^g n in a position j such that 
j = 1 mod n + 1. Then, from the construction of ws 1 and wg n , it must be that s either 
belongs to £1 or does not belong to £2. This is a contradiction. 

We use essentially the same technique to address the O-semantics. To show the exponential 
lower bound for Co, define e n = (xi---x n )*, and let P n = {(w,w) \ w G {0,1}"}. Clearly, P n 
contains 2 n pairs. All that is left to do is to show that Co(e n ) and P n satisfy properties (1) and 
(2) above. 

1. From the fact that £o(e n ) = Clw&io 1}™ w * ' we have that for each u G {0, 1}™ the word uu 
belongs to C<y(e n ). 
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2. The same fact shows that for every u, v G {0, l} n , if u ^ v, then uv ^ fltoejo iy* w *> anc ^ thus 

uv <£ Co(e n ). 

This finishes the proof of the theorem. 

Proof of Theorem 7: 

It will be more convenient for us to work with automata than with regular expressions. We deal 
with NFAs with extended transitions, which can be not just of the form (q,a,q'), where q and 
q' are states, and a G E, but also (q,w,q'), where w G £*. Such an automaton accepts a word 
s 6 S* in the standard way: in a run, in state q, if the subword starting in the current position 
is w, it can skip w and move to q' if there is a transition (q,w,q'). Note that such automata are 
a mere syntactic convenience (they will appear as the results of applying valuations), as any such 
automaton A can be transformed, in polynomial time, into a usual NFA A' so that C{A) = C(A'). 
Indeed, for each transition t = (q, w, q') with w = a\ . . . a m , introduce new states qj, . . . , q™~ and 
add transitions (q, a\, qj), (qj, 02, gf ), . . . , (<7™~ , a n , q') to A'. Thus, we shall work with automata 
with extended transitions. 

Let e be a parameterized regular expression with variables xi, . . . ,x n , whose domains are regular 
languages L±,...,L n . Let A e be an NFA equivalent to e, over the alphabet £ U {x±, . . . , x n }. If we 
have a valuation v so that v{x,i) G Lj for each i < n, then i^(A e ) is an automaton with extended 
transitions: in it, each transition (q,Xi,q r ) is replaced by (q,i/(xi),q'). It is then immediate from 
the construction that C{y(e)) = C{u(A e )) and thus £n(e) = (~\ v C(v(A e ))- 

Next, consider finitary valuations u, which are partial functions defined on variables Xi such 
that Li is a finite languag of course v(xi) G Li. On variables Xj with infinite Lj such valuations 
are undefined. By v(A e ) we mean the automaton (with extended transition) resulting from A e 
as follows. First, all transitions (q,a,q'), where a is a letter, are kept. Second, if (q,Xi,q') is a 
transition, then v(A e ) contains (q,v(xi),q') only if v(xi) is defined. In other words, transitions 
using variables whose domains are infinite, are dropped. 

Let i/±, ... , I'm enumerate all the finitary valuations (clearly there are finitely many of them). 
Let Ai = Vi(A e ), for i < M. We now show that Cn{A e ) = fli<j\/ £(-4i)- 

First, if Vi is a finitary valuation and v is any extension of Vi to a valuation on all the variables 
xi,...,x n , then clearly C(i>i(A e )) C C(u(A e )). Note that every valuation v is an extension of 
some finitary valuation i/j, and thus C n (A e ) = flail valuations v £(v(A e )) 5 fli<M £>{ v i(Ae)) ■ For 
the reverse inclusion, let w G Cn{A e )\ in particular, w G C(v(A e )) for every valuation v. Take an 
arbitrary finitary valuation Vi and let Vi be the (infinite) set of all the valuations v that extend 
Vi. Let Vi(w) be the subset of Vi that contains valuations v with the property that for each 
variable Xj with an infinite domain Lj, we have (^(xj)! > \w\; clearly Vi(w) is an infinite set as 
well. Take any v G Vi(w); we know from w G Cu{A e ) that w G C(v(A e )). In particular, there 
is an accepting run of v(A e ) that never uses any transition (q,v(xj),q') with Lj infinite, since 
|^(xj)| > \w\. Thus, such an accepting run may only use transitions resulting from valuations of 
variables with finite domains, and hence it is also an accepting run of Vi(A e ). This shows that 
w G C(vi(Ai)); since Vi was chosen arbitrarily, it means that w G f] i<M C(vi(A e )), and thus proves 

This immediately shows that £a(e) = Cn(A e ) is regular, as a finite intersection of regular 
languages. Lower bounds on complexity apply immediately as they were all established for the case 
when each Lj = X. So we need to prove upper bounds. To do so, one can see, by analyzing the 
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proofs for the case when all domains are S, that it suffices to establish the following facts on the 
set of automata Ai, for i < M: 

• M is at most exponential in the size of the input; 

• checking whether a given automaton A is one of the A^s can be done in time polynomial in 
the size of A; and 

• for each Ai, for i < M, its size it at most polynomial in the size of the input. 

(To give a couple of examples, to see the ExpSPACE-bound on NonemptinesSd, we construct 
exponentially many automata of polynomial size and check nonemptiness of their intersection. To 
see the NP upper bound on Membership<>, one guesses a polynomial-size Ai, checks in polynomial 
time that it is indeed a correct automaton, and then checks again in polynomial time whether a 
given word is accepted by it.) 

Recall that the input to the problem we are considering is (e; L), or (A e ; L), and we can assume 
that each Li is given by an NFA Bi (if part of the input is a regular expression, we can convert it 
into an NFA in 0(n log n) time [13]). 

To show the bounds, assume without loss of generality that from each Bi all nonreachable states, 
and states from which final states cannot be reached, are removed (this can be done in polynomial 
time). Then C{Bi) is finite iff Bi does not have cycles. Thus, if n, is the number of states of Bi, 
then the longest word accepted by Bi is of length m, and hence the size of each finite Li = C{Bi) is 
at most |£| ni+1 . Hence, the total number of all the words in finite languages Lj's is less than |E| , 
where N = n + ^ rij , with the sum taken over indexes i such that Li is finite. This means that in 
turn the number of finitary valuations M, i.e. mappings from some of the variables Xj's into words 
in these finite languages is at most |S| , which is thus exponential in the size of the input. 

The remaining two properties are easy. Since the length of each word accepted by one of the 
BiS is at most the number of states in Bi, the size of all the automata Vi(A e ) is bounded by 
a polynomial in the size of the input; changing extended transitions in those to the usual NFA 
transitions involves only a linear increase of size. To check whether an automaton A is one of the 
Ai's, we check whether all its transitions involving both states from A e come from A e or from a 
single- letter valuation. Every other transition must be on a path between two states from A e . One 
reads words on these paths, and checks if they form a finitary valuation. This can easily be done 
in polynomial time. 
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